[jira] [Commented] (HIVE-17006) LLAP: Parquet caching v1
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149629#comment-16149629 ] Sergey Shelukhin commented on HIVE-17006: - There are only two new failures. Updated one out file; the spark negative test has passed for me locally. > LLAP: Parquet caching v1 > > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, > HIVE-17006.03.patch, HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149305#comment-16149305 ] Gunther Hagleitner commented on HIVE-17006: --- +1 pending test failures (insert_values_orig_table_use_metadata needs golden file update). > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, > HIVE-17006.03.patch, HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16148242#comment-16148242 ] Hive QA commented on HIVE-17006: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12884556/HIVE-17006.03.patch {color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11020 tests executed *Failed tests:* {noformat} TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) (batchId=280) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=61) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_partitioned_date_time] (batchId=161) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=234) org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.testCliDriver[spark_stage_max_tasks] (batchId=241) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6608/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6608/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6608/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12884556 - PreCommit-HIVE-Build > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, > HIVE-17006.03.patch, HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146102#comment-16146102 ] Sergey Shelukhin commented on HIVE-17006: - [~hagleitn] ping? > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, > HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127964#comment-16127964 ] Sergey Shelukhin commented on HIVE-17006: - * The fix is not specific to this patch. I noticed it while working on the patch. * Uncopyfying is implied in HIVE-15665, otherwise class names/etc. would collide so it won't be committable without that. BB put will also be added there. * Which error handling? > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, > HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126595#comment-16126595 ] Gunther Hagleitner commented on HIVE-17006: --- can you create follow ups for the major todos? a) Uncopify ParquetMetadataCacheImpl, BB put for FileMetadataCache, Parquet impl in LlapIo impl... The error handling in LlapCacheAwareFs should be done before commit it seems, that can leave a leak? > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, > HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126585#comment-16126585 ] Gunther Hagleitner commented on HIVE-17006: --- call it AUTHORITY then? To be less confusing? (I'm guessing empty isn't an option here?) > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, > HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126584#comment-16126584 ] Gunther Hagleitner commented on HIVE-17006: --- The fix in HDFSUtils to handle case where shim doesn't return file id. Is that specific to this patch? Seems like that's a problem even w/o parquet? > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, > HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126577#comment-16126577 ] Sergey Shelukhin commented on HIVE-17006: - I have to put something as an authority for the URI. This is as good as anything. > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, > HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126557#comment-16126557 ] Gunther Hagleitner commented on HIVE-17006: --- Follow would be good for counters. Why do you set the uri to: SCHEME +"://"+SCHEME ? what's the second scheme for? > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, > HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126148#comment-16126148 ] Sergey Shelukhin commented on HIVE-17006: - Fragment-level IO counters. They are maintained in IO elevator and this doesn't use the elevator, only cache directly. Maybe this needs a follow-up JIRA. > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, > HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125994#comment-16125994 ] Gunther Hagleitner commented on HIVE-17006: --- {quote} // TODO: we currently pass null counters because this doesn't use LlapRecordReader. // Create counters for non-elevator-using fragments also? {quote} What counters are those? > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, > HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16108259#comment-16108259 ] Hive QA commented on HIVE-17006: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12879706/HIVE-17006.02.patch {color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 11019 tests executed *Failed tests:* {noformat} TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=240) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6203/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6203/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6203/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12879706 - PreCommit-HIVE-Build > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, > HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104890#comment-16104890 ] Hive QA commented on HIVE-17006: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12879254/HIVE-17006.01.patch {color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 11014 tests executed *Failed tests:* {noformat} TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=240) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=240) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=240) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_partitioned_date_time] (batchId=160) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_parquet] (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_parquet_types] (batchId=158) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge (batchId=206) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6168/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6168/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6168/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12879254 - PreCommit-HIVE-Build > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.01.patch, HIVE-17006.patch, > HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103723#comment-16103723 ] Sergey Shelukhin commented on HIVE-17006: - load_dyn_part5 may be related (need to dbl check), the rest are unrelated. [~prasanth_j] do you want to review? A lot of the code for metadata cache is the same as in HIVE-15665, so only parts of the patch need separate review > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102468#comment-16102468 ] Hive QA commented on HIVE-17006: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12879056/HIVE-17006.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11013 tests executed *Failed tests:* {noformat} TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[load_dyn_part5] (batchId=153) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[schemeAuthority] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge (batchId=206) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6138/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6138/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6138/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12879056 - PreCommit-HIVE-Build > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102342#comment-16102342 ] Sergey Shelukhin commented on HIVE-17006: - RB at https://reviews.apache.org/r/61164/ [~Ferd] I cannot find your username on RB, I can add you if you tell me what it is :) > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099361#comment-16099361 ] Sergey Shelukhin commented on HIVE-17006: - Sorry I was on vacation. Will do, after cleaning up the patch and testing more. > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071848#comment-16071848 ] Ferdinand Xu commented on HIVE-17006: - Hi [~sershe], thank you for the initial patch. Could you create a review board on PR for this? > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)