[jira] [Assigned] (KUDU-3177) Expose snapshotTimestampMicros to Spark Read Options

2020-08-04 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke reassigned KUDU-3177:
-

Assignee: Kevin J McCarthy  (was: Grant Henke)

> Expose snapshotTimestampMicros to Spark Read Options
> 
>
> Key: KUDU-3177
> URL: https://issues.apache.org/jira/browse/KUDU-3177
> Project: Kudu
>  Issue Type: Improvement
>  Components: spark
>Reporter: Kevin J McCarthy
>Assignee: Kevin J McCarthy
>Priority: Major
>  Labels: beginner
> Fix For: 1.13.0
>
>
> If a spark application needs to read from the same table multiple times and 
> that table has new records that may come in during the life of the 
> application, you may get inconsistent scan results unless you persist the 
> DataFrame. I'd like to expose snapshotTimestampMicros to the spark read 
> options so I can set a timestamp before the first scan and use that for 
> READ_AT_SNAPSHOT to keep all scans on the same table consistent throughout 
> the run of the application. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KUDU-3177) Expose snapshotTimestampMicros to Spark Read Options

2020-08-04 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke reassigned KUDU-3177:
-

Assignee: Grant Henke

> Expose snapshotTimestampMicros to Spark Read Options
> 
>
> Key: KUDU-3177
> URL: https://issues.apache.org/jira/browse/KUDU-3177
> Project: Kudu
>  Issue Type: Improvement
>  Components: spark
>Reporter: Kevin J McCarthy
>Assignee: Grant Henke
>Priority: Major
>  Labels: beginner
> Fix For: 1.13.0
>
>
> If a spark application needs to read from the same table multiple times and 
> that table has new records that may come in during the life of the 
> application, you may get inconsistent scan results unless you persist the 
> DataFrame. I'd like to expose snapshotTimestampMicros to the spark read 
> options so I can set a timestamp before the first scan and use that for 
> READ_AT_SNAPSHOT to keep all scans on the same table consistent throughout 
> the run of the application. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-3177) Expose snapshotTimestampMicros to Spark Read Options

2020-08-04 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-3177.
---
Fix Version/s: 1.13.0
   Resolution: Fixed

> Expose snapshotTimestampMicros to Spark Read Options
> 
>
> Key: KUDU-3177
> URL: https://issues.apache.org/jira/browse/KUDU-3177
> Project: Kudu
>  Issue Type: Improvement
>  Components: spark
>Reporter: Kevin J McCarthy
>Priority: Major
>  Labels: beginner
> Fix For: 1.13.0
>
>
> If a spark application needs to read from the same table multiple times and 
> that table has new records that may come in during the life of the 
> application, you may get inconsistent scan results unless you persist the 
> DataFrame. I'd like to expose snapshotTimestampMicros to the spark read 
> options so I can set a timestamp before the first scan and use that for 
> READ_AT_SNAPSHOT to keep all scans on the same table consistent throughout 
> the run of the application. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3177) Expose snapshotTimestampMicros to Spark Read Options

2020-08-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170998#comment-17170998
 ] 

ASF subversion and git services commented on KUDU-3177:
---

Commit 40289e2a2faa021826b9424864ab2935507bef33 in kudu's branch 
refs/heads/master from kevinmccarthy
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=40289e2 ]

[KUDU-3177] Added kudu.snapshotTimestampMicros to kudu spark readOptions
as optional property

Added property snapshotTimestampMs to spark read options which will
allow consistant scanswhen timestamp is set before the first
dataFrame read.

Change-Id: I00862c0e174a964efc6cab0b8141b1ac5a1bebc0
Reviewed-on: http://gerrit.cloudera.org:8080/16276
Tested-by: Kudu Jenkins
Reviewed-by: Grant Henke 


> Expose snapshotTimestampMicros to Spark Read Options
> 
>
> Key: KUDU-3177
> URL: https://issues.apache.org/jira/browse/KUDU-3177
> Project: Kudu
>  Issue Type: Improvement
>  Components: spark
>Reporter: Kevin J McCarthy
>Priority: Major
>  Labels: beginner
>
> If a spark application needs to read from the same table multiple times and 
> that table has new records that may come in during the life of the 
> application, you may get inconsistent scan results unless you persist the 
> DataFrame. I'd like to expose snapshotTimestampMicros to the spark read 
> options so I can set a timestamp before the first scan and use that for 
> READ_AT_SNAPSHOT to keep all scans on the same table consistent throughout 
> the run of the application. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3180) kudu don't always prefer to flush MRS/DMS that anchor more memory

2020-08-04 Thread YifanZhang (Jira)
YifanZhang created KUDU-3180:


 Summary: kudu don't always prefer to flush MRS/DMS that anchor 
more memory
 Key: KUDU-3180
 URL: https://issues.apache.org/jira/browse/KUDU-3180
 Project: Kudu
  Issue Type: Bug
Reporter: YifanZhang
 Attachments: image-2020-08-04-20-26-53-749.png, 
image-2020-08-04-20-28-00-665.png

Current time-based flush policy always give a flush op a high score if we 
haven't flushed for the tablet in a long time, that may lead to starvation of 
ops that could free more memory.

We set  -flush_threshold_mb=32,  -flush_threshold_secs=1800 in a cluster, and 
find that some small MRS/DMS flushes has a higher perf score than big MRS/DMS 
flushes and compactions, which seems not so reasonable.

!image-2020-08-04-20-26-53-749.png|width=1424,height=317!!image-2020-08-04-20-28-00-665.png|width=1414,height=327!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)