[jira] [Assigned] (KUDU-3177) Expose snapshotTimestampMicros to Spark Read Options
[ https://issues.apache.org/jira/browse/KUDU-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke reassigned KUDU-3177: - Assignee: Kevin J McCarthy (was: Grant Henke) > Expose snapshotTimestampMicros to Spark Read Options > > > Key: KUDU-3177 > URL: https://issues.apache.org/jira/browse/KUDU-3177 > Project: Kudu > Issue Type: Improvement > Components: spark >Reporter: Kevin J McCarthy >Assignee: Kevin J McCarthy >Priority: Major > Labels: beginner > Fix For: 1.13.0 > > > If a spark application needs to read from the same table multiple times and > that table has new records that may come in during the life of the > application, you may get inconsistent scan results unless you persist the > DataFrame. I'd like to expose snapshotTimestampMicros to the spark read > options so I can set a timestamp before the first scan and use that for > READ_AT_SNAPSHOT to keep all scans on the same table consistent throughout > the run of the application. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KUDU-3177) Expose snapshotTimestampMicros to Spark Read Options
[ https://issues.apache.org/jira/browse/KUDU-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke reassigned KUDU-3177: - Assignee: Grant Henke > Expose snapshotTimestampMicros to Spark Read Options > > > Key: KUDU-3177 > URL: https://issues.apache.org/jira/browse/KUDU-3177 > Project: Kudu > Issue Type: Improvement > Components: spark >Reporter: Kevin J McCarthy >Assignee: Grant Henke >Priority: Major > Labels: beginner > Fix For: 1.13.0 > > > If a spark application needs to read from the same table multiple times and > that table has new records that may come in during the life of the > application, you may get inconsistent scan results unless you persist the > DataFrame. I'd like to expose snapshotTimestampMicros to the spark read > options so I can set a timestamp before the first scan and use that for > READ_AT_SNAPSHOT to keep all scans on the same table consistent throughout > the run of the application. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-3177) Expose snapshotTimestampMicros to Spark Read Options
[ https://issues.apache.org/jira/browse/KUDU-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-3177. --- Fix Version/s: 1.13.0 Resolution: Fixed > Expose snapshotTimestampMicros to Spark Read Options > > > Key: KUDU-3177 > URL: https://issues.apache.org/jira/browse/KUDU-3177 > Project: Kudu > Issue Type: Improvement > Components: spark >Reporter: Kevin J McCarthy >Priority: Major > Labels: beginner > Fix For: 1.13.0 > > > If a spark application needs to read from the same table multiple times and > that table has new records that may come in during the life of the > application, you may get inconsistent scan results unless you persist the > DataFrame. I'd like to expose snapshotTimestampMicros to the spark read > options so I can set a timestamp before the first scan and use that for > READ_AT_SNAPSHOT to keep all scans on the same table consistent throughout > the run of the application. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3177) Expose snapshotTimestampMicros to Spark Read Options
[ https://issues.apache.org/jira/browse/KUDU-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170998#comment-17170998 ] ASF subversion and git services commented on KUDU-3177: --- Commit 40289e2a2faa021826b9424864ab2935507bef33 in kudu's branch refs/heads/master from kevinmccarthy [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=40289e2 ] [KUDU-3177] Added kudu.snapshotTimestampMicros to kudu spark readOptions as optional property Added property snapshotTimestampMs to spark read options which will allow consistant scanswhen timestamp is set before the first dataFrame read. Change-Id: I00862c0e174a964efc6cab0b8141b1ac5a1bebc0 Reviewed-on: http://gerrit.cloudera.org:8080/16276 Tested-by: Kudu Jenkins Reviewed-by: Grant Henke > Expose snapshotTimestampMicros to Spark Read Options > > > Key: KUDU-3177 > URL: https://issues.apache.org/jira/browse/KUDU-3177 > Project: Kudu > Issue Type: Improvement > Components: spark >Reporter: Kevin J McCarthy >Priority: Major > Labels: beginner > > If a spark application needs to read from the same table multiple times and > that table has new records that may come in during the life of the > application, you may get inconsistent scan results unless you persist the > DataFrame. I'd like to expose snapshotTimestampMicros to the spark read > options so I can set a timestamp before the first scan and use that for > READ_AT_SNAPSHOT to keep all scans on the same table consistent throughout > the run of the application. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3180) kudu don't always prefer to flush MRS/DMS that anchor more memory
YifanZhang created KUDU-3180: Summary: kudu don't always prefer to flush MRS/DMS that anchor more memory Key: KUDU-3180 URL: https://issues.apache.org/jira/browse/KUDU-3180 Project: Kudu Issue Type: Bug Reporter: YifanZhang Attachments: image-2020-08-04-20-26-53-749.png, image-2020-08-04-20-28-00-665.png Current time-based flush policy always give a flush op a high score if we haven't flushed for the tablet in a long time, that may lead to starvation of ops that could free more memory. We set -flush_threshold_mb=32, -flush_threshold_secs=1800 in a cluster, and find that some small MRS/DMS flushes has a higher perf score than big MRS/DMS flushes and compactions, which seems not so reasonable. !image-2020-08-04-20-26-53-749.png|width=1424,height=317!!image-2020-08-04-20-28-00-665.png|width=1414,height=327! -- This message was sent by Atlassian Jira (v8.3.4#803005)