[Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Skye Wanderman-Milne has uploaded a new patch set (#5). Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option .. IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option This patch introduces a new query option, PARQUET_FALLBACK_SCHEMA_RESOLUTION which allows Parquet files' schemas to be resolved by either name or position. It's "fallback" because eventually field IDs will be the primary schema resolution scheme, and we don't want to create an option that we will have to change the name of later. The default is still by position. I chose to do a query option because it will make testing easier and also be easier to diagnose resolution problems quickly in the field. If users want to switch the default behavior to be by name (like Hive), they can use the --default_query_options flag. This patch also introduces a new test section, SHELL, which can be used to execute shell commands in a .test file. This is useful for copying files into test tables. Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift A testdata/parquet_schema_resolution/README A testdata/parquet_schema_resolution/switched_map.avsc A testdata/parquet_schema_resolution/switched_map.json A testdata/parquet_schema_resolution/switched_map.parq A testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test M tests/common/impala_test_suite.py M tests/conftest.py M tests/query_test/test_scanners.py M tests/util/test_file_parser.py 15 files changed, 349 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/84/2384/5 -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Skye Wanderman-Milne
[Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/2384/5/be/src/service/query-options.cc File be/src/service/query-options.cc: Line 371: value case insensitive check? http://gerrit.cloudera.org:8080/#/c/2384/5/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: Line 176: string I'm not necessarily opposed to this being a string, but why did you do this over an enum as we do elsewhere? -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Skye Wanderman-Milne Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Skye Wanderman-Milne has posted comments on this change. Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/2384/5/be/src/service/query-options.cc File be/src/service/query-options.cc: Line 371: value > case insensitive check? Done http://gerrit.cloudera.org:8080/#/c/2384/5/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: Line 176: string > I'm not necessarily opposed to this being a string, but why did you do this Ah didn't see that, will change to enum -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Skye Wanderman-Milne Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Skye Wanderman-Milne has uploaded a new patch set (#6). Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option .. IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option This patch introduces a new query option, PARQUET_FALLBACK_SCHEMA_RESOLUTION which allows Parquet files' schemas to be resolved by either name or position. It's "fallback" because eventually field IDs will be the primary schema resolution scheme, and we don't want to create an option that we will have to change the name of later. The default is still by position. I chose to do a query option because it will make testing easier and also be easier to diagnose resolution problems quickly in the field. If users want to switch the default behavior to be by name (like Hive), they can use the --default_query_options flag. This patch also introduces a new test section, SHELL, which can be used to execute shell commands in a .test file. This is useful for copying files into test tables. Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift A testdata/parquet_schema_resolution/README A testdata/parquet_schema_resolution/switched_map.avsc A testdata/parquet_schema_resolution/switched_map.json A testdata/parquet_schema_resolution/switched_map.parq A testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test M tests/common/impala_test_suite.py M tests/conftest.py M tests/query_test/test_scanners.py M tests/util/test_file_parser.py 15 files changed, 368 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/84/2384/6 -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 6 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Skye Wanderman-Milne
[Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Skye Wanderman-Milne has uploaded a new patch set (#7). Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option .. IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option This patch introduces a new query option, PARQUET_FALLBACK_SCHEMA_RESOLUTION which allows Parquet files' schemas to be resolved by either name or position. It's "fallback" because eventually field IDs will be the primary schema resolution scheme, and we don't want to create an option that we will have to change the name of later. The default is still by position. I chose to do a query option because it will make testing easier and also be easier to diagnose resolution problems quickly in the field. If users want to switch the default behavior to be by name (like Hive), they can use the --default_query_options flag. This patch also introduces a new test section, SHELL, which can be used to execute shell commands in a .test file. This is useful for copying files into test tables. Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift A testdata/parquet_schema_resolution/README A testdata/parquet_schema_resolution/switched_map.avsc A testdata/parquet_schema_resolution/switched_map.json A testdata/parquet_schema_resolution/switched_map.parq A testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test M tests/common/impala_test_suite.py M tests/conftest.py M tests/query_test/test_scanners.py M tests/util/test_file_parser.py 15 files changed, 389 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/84/2384/7 -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 7 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Skye Wanderman-Milne
[Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Dan Hecht has posted comments on this change. Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option .. Patch Set 7: (3 comments) http://gerrit.cloudera.org:8080/#/c/2384/7/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 2031: TParquetFallbackSchemaResolution::POSITION); move this DCHECK to L2065. no reason to have a separate if-stmt for it when it can be incorporated into the code control flow. also, consider getting rid of 'resolve by_name' variable. http://gerrit.cloudera.org:8080/#/c/2384/7/be/src/service/query-options.cc File be/src/service/query-options.cc: Line 371: "0" Why allow the numerical enum value? (Especially given that the enum is not exposed)? I see in other options we sometimes allow it and other times don't, so I guess I'm okay either way but curious about the reasoning. Line 379: position in other statuses above, we use CAPS for the option name, no quotes, and also put the numerical value in parenthesis. would be nice to be consistent. (Though I think the parenthesis notation for the number is kinda confusing) -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 7 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Skye Wanderman-Milne Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/2384/7/be/src/service/query-options.cc File be/src/service/query-options.cc: Line 371: "0" > Why allow the numerical enum value? (Especially given that the enum is not I asked her to add this. It's because TQueryOptionsToMap ends up writing out the values as the enum values, so this allows us to parse it back. There are some cases (e.g. I think TQueryOptionsToMap sends the client those strings and then the client may end up sending them back) where that can be a problem if we don't handle it. -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 7 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Skye Wanderman-Milne Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Dan Hecht has posted comments on this change. Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/2384/7/be/src/service/query-options.cc File be/src/service/query-options.cc: Line 371: "0" > I asked her to add this. It's because TQueryOptionsToMap ends up writing ou Okay. I think a short comment near the top of this routine explaining that would be helpful. Also, since these numbers must correspond to the enum values, adding this would be good (or make the code use the enum values directly): DCHECK_EQ(TParquetFallbackSchemaResolution::POSITION, 0); DCHECK_EQITParquetFallbackSchemaResolution::NAME, 1); Not this change, but is COMPRESSION_CODEC problematic then? -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 7 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Skye Wanderman-Milne Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/2384/7/be/src/service/query-options.cc File be/src/service/query-options.cc: Line 371: "0" > Okay. I think a short comment near the top of this routine explaining that Yeah these are good suggestions, and probably COMPRESSION_CODEC could be broken in some cases, though I wasn't able to produce an issue in a few minutes of playing with it. I'd like for us to find a way to avoid this issue in general. I thought about it briefly in the past but didn't have a good solution. I agree it'd be nice to add a comment at the top for now. -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 7 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Skye Wanderman-Milne Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Skye Wanderman-Milne has uploaded a new patch set (#8). Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option .. IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option This patch introduces a new query option, PARQUET_FALLBACK_SCHEMA_RESOLUTION which allows Parquet files' schemas to be resolved by either name or position. It's "fallback" because eventually field IDs will be the primary schema resolution scheme, and we don't want to create an option that we will have to change the name of later. The default is still by position. I chose to do a query option because it will make testing easier and also be easier to diagnose resolution problems quickly in the field. If users want to switch the default behavior to be by name (like Hive), they can use the --default_query_options flag. This patch also introduces a new test section, SHELL, which can be used to execute shell commands in a .test file. This is useful for copying files into test tables. Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift A testdata/parquet_schema_resolution/README A testdata/parquet_schema_resolution/switched_map.avsc A testdata/parquet_schema_resolution/switched_map.json A testdata/parquet_schema_resolution/switched_map.parq A testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test M tests/common/impala_test_suite.py M tests/conftest.py M tests/query_test/test_scanners.py M tests/util/test_file_parser.py 15 files changed, 393 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/84/2384/8 -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 8 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Skye Wanderman-Milne
[Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Skye Wanderman-Milne has posted comments on this change. Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option .. Patch Set 7: (3 comments) http://gerrit.cloudera.org:8080/#/c/2384/7/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 2031: TParquetFallbackSchemaResolution::POSITION); > move this DCHECK to L2065. no reason to have a separate if-stmt for it when Done. FWIW I put this extra if statement so you don't have to read through all of the below to figure out there's only two options, but I'm fine with moving it. I'm gonna keep 'resolve_by_name' since I use it on L2078. http://gerrit.cloudera.org:8080/#/c/2384/7/be/src/service/query-options.cc File be/src/service/query-options.cc: Line 371: "0" > Yeah these are good suggestions, and probably COMPRESSION_CODEC could be br I added the comment and compare directly against the enum values. Line 379: position > in other statuses above, we use CAPS for the option name, no quotes, and al I used caps, but left out the numerical values since they're not really for users. -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 7 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Skye Wanderman-Milne Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Dan Hecht has posted comments on this change. Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option .. Patch Set 8: Code-Review+2 (4 comments) Please see if Matt wanted to make another pass before committing. http://gerrit.cloudera.org:8080/#/c/2384/8/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 2056: ordinal position Line 2061: ordinal position (just so we have a consistent terminology). http://gerrit.cloudera.org:8080/#/c/2384/8/testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test File testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test: Line 203: QUERY the comments for the other queries were helpful. how about one here. Line 213: QUERY and here, to explain what is being tested. -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 8 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Skye Wanderman-Milne Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Skye Wanderman-Milne has posted comments on this change. Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option .. Patch Set 8: (4 comments) http://gerrit.cloudera.org:8080/#/c/2384/8/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 2056: ordinal > position Done Line 2061: ordinal > position (just so we have a consistent terminology). Done http://gerrit.cloudera.org:8080/#/c/2384/8/testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test File testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test: Line 203: QUERY > the comments for the other queries were helpful. how about one here. Done Line 213: QUERY > and here, to explain what is being tested. Done -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 8 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Skye Wanderman-Milne Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/2384 to look at the new patch set (#9). Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option .. IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option This patch introduces a new query option, PARQUET_FALLBACK_SCHEMA_RESOLUTION which allows Parquet files' schemas to be resolved by either name or position. It's "fallback" because eventually field IDs will be the primary schema resolution scheme, and we don't want to create an option that we will have to change the name of later. The default is still by position. I chose to do a query option because it will make testing easier and also be easier to diagnose resolution problems quickly in the field. If users want to switch the default behavior to be by name (like Hive), they can use the --default_query_options flag. This patch also introduces a new test section, SHELL, which can be used to execute shell commands in a .test file. This is useful for copying files into test tables. Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift A testdata/parquet_schema_resolution/README A testdata/parquet_schema_resolution/switched_map.avsc A testdata/parquet_schema_resolution/switched_map.json A testdata/parquet_schema_resolution/switched_map.parq A testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test M tests/common/impala_test_suite.py M tests/conftest.py M tests/query_test/test_scanners.py M tests/util/test_file_parser.py 15 files changed, 395 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/84/2384/9 -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 9 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Skye Wanderman-Milne
[Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option .. Patch Set 9: Code-Review+1 thanks! -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 9 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Skye Wanderman-Milne Gerrit-HasComments: No
[Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Hello Matthew Jacobs, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/2384 to look at the new patch set (#10). Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option .. IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option This patch introduces a new query option, PARQUET_FALLBACK_SCHEMA_RESOLUTION which allows Parquet files' schemas to be resolved by either name or position. It's "fallback" because eventually field IDs will be the primary schema resolution scheme, and we don't want to create an option that we will have to change the name of later. The default is still by position. I chose to do a query option because it will make testing easier and also be easier to diagnose resolution problems quickly in the field. If users want to switch the default behavior to be by name (like Hive), they can use the --default_query_options flag. This patch also introduces a new test section, SHELL, which can be used to execute shell commands in a .test file. This is useful for copying files into test tables. Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift A testdata/parquet_schema_resolution/README A testdata/parquet_schema_resolution/switched_map.avsc A testdata/parquet_schema_resolution/switched_map.json A testdata/parquet_schema_resolution/switched_map.parq A testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test M tests/common/impala_test_suite.py M tests/conftest.py M tests/query_test/test_scanners.py M tests/util/test_file_parser.py 15 files changed, 395 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/84/2384/10 -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 10 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Skye Wanderman-Milne
[Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Hello Matthew Jacobs, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/2384 to look at the new patch set (#11). Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option .. IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option This patch introduces a new query option, PARQUET_FALLBACK_SCHEMA_RESOLUTION which allows Parquet files' schemas to be resolved by either name or position. It's "fallback" because eventually field IDs will be the primary schema resolution scheme, and we don't want to create an option that we will have to change the name of later. The default is still by position. I chose to do a query option because it will make testing easier and also be easier to diagnose resolution problems quickly in the field. If users want to switch the default behavior to be by name (like Hive), they can use the --default_query_options flag. This patch also introduces a new test section, SHELL, which can be used to execute shell commands in a .test file. This is useful for copying files into test tables. Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift A testdata/parquet_schema_resolution/README A testdata/parquet_schema_resolution/switched_map.avsc A testdata/parquet_schema_resolution/switched_map.json A testdata/parquet_schema_resolution/switched_map.parq A testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test M tests/common/impala_test_suite.py M tests/conftest.py M tests/query_test/test_scanners.py M tests/util/test_file_parser.py 15 files changed, 395 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/84/2384/11 -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 11 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Skye Wanderman-Milne
[Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Hello Matthew Jacobs, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/2384 to look at the new patch set (#12). Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option .. IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option This patch introduces a new query option, PARQUET_FALLBACK_SCHEMA_RESOLUTION which allows Parquet files' schemas to be resolved by either name or position. It's "fallback" because eventually field IDs will be the primary schema resolution scheme, and we don't want to create an option that we will have to change the name of later. The default is still by position. I chose to do a query option because it will make testing easier and also be easier to diagnose resolution problems quickly in the field. If users want to switch the default behavior to be by name (like Hive), they can use the --default_query_options flag. This patch also introduces a new test section, SHELL, which can be used to execute shell commands in a .test file. This is useful for copying files into test tables. Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift A testdata/parquet_schema_resolution/README A testdata/parquet_schema_resolution/switched_map.avsc A testdata/parquet_schema_resolution/switched_map.json A testdata/parquet_schema_resolution/switched_map.parq A testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test M tests/common/impala_test_suite.py M tests/conftest.py M tests/query_test/test_scanners.py M tests/util/test_file_parser.py 15 files changed, 395 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/84/2384/12 -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 12 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Skye Wanderman-Milne
[Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Skye Wanderman-Milne has posted comments on this change. Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option .. Patch Set 12: Code-Review+2 rebase -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 12 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Skye Wanderman-Milne Gerrit-HasComments: No
[Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option .. IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option This patch introduces a new query option, PARQUET_FALLBACK_SCHEMA_RESOLUTION which allows Parquet files' schemas to be resolved by either name or position. It's "fallback" because eventually field IDs will be the primary schema resolution scheme, and we don't want to create an option that we will have to change the name of later. The default is still by position. I chose to do a query option because it will make testing easier and also be easier to diagnose resolution problems quickly in the field. If users want to switch the default behavior to be by name (like Hive), they can use the --default_query_options flag. This patch also introduces a new test section, SHELL, which can be used to execute shell commands in a .test file. This is useful for copying files into test tables. Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Reviewed-on: http://gerrit.cloudera.org:8080/2384 Reviewed-by: Skye Wanderman-Milne Tested-by: Internal Jenkins --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift A testdata/parquet_schema_resolution/README A testdata/parquet_schema_resolution/switched_map.avsc A testdata/parquet_schema_resolution/switched_map.json A testdata/parquet_schema_resolution/switched_map.parq A testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test M tests/common/impala_test_suite.py M tests/conftest.py M tests/query_test/test_scanners.py M tests/util/test_file_parser.py 15 files changed, 395 insertions(+), 18 deletions(-) Approvals: Internal Jenkins: Verified Skye Wanderman-Milne: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 13 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Skye Wanderman-Milne
Re: [Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Can you make sure to parse the integer enum values as well as the string names? On Tue, Mar 29, 2016 at 2:36 PM Skye Wanderman-Milne (Code Review) < ger...@cloudera.org> wrote: > Skye Wanderman-Milne has uploaded a new patch set (#6). > > Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION > query option > .. > > IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option > > This patch introduces a new query option, > PARQUET_FALLBACK_SCHEMA_RESOLUTION which allows Parquet files' schemas > to be resolved by either name or position. It's "fallback" because > eventually field IDs will be the primary schema resolution scheme, and > we don't want to create an option that we will have to change the name > of later. The default is still by position. I chose to do a query > option because it will make testing easier and also be easier to > diagnose resolution problems quickly in the field. If users want to > switch the default behavior to be by name (like Hive), they can use > the --default_query_options flag. > > This patch also introduces a new test section, SHELL, which can be > used to execute shell commands in a .test file. This is useful for > copying files into test tables. > > Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 > --- > M be/src/exec/hdfs-parquet-scanner.cc > M be/src/exec/hdfs-parquet-scanner.h > M be/src/service/query-options.cc > M be/src/service/query-options.h > M common/thrift/ImpalaInternalService.thrift > M common/thrift/ImpalaService.thrift > A testdata/parquet_schema_resolution/README > A testdata/parquet_schema_resolution/switched_map.avsc > A testdata/parquet_schema_resolution/switched_map.json > A testdata/parquet_schema_resolution/switched_map.parq > A > testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test > M tests/common/impala_test_suite.py > M tests/conftest.py > M tests/query_test/test_scanners.py > M tests/util/test_file_parser.py > 15 files changed, 368 insertions(+), 18 deletions(-) > > > git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/84/2384/6 > -- > To view, visit http://gerrit.cloudera.org:8080/2384 > To unsubscribe, visit http://gerrit.cloudera.org:8080/settings > > Gerrit-MessageType: newpatchset > Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 > Gerrit-PatchSet: 6 > Gerrit-Project: Impala > Gerrit-Branch: cdh5-trunk > Gerrit-Owner: Skye Wanderman-Milne > Gerrit-Reviewer: Dan Hecht > Gerrit-Reviewer: Juan Yu > Gerrit-Reviewer: Matthew Jacobs > Gerrit-Reviewer: Michael Ho > Gerrit-Reviewer: Silvius Rus > Gerrit-Reviewer: Skye Wanderman-Milne >
Re: [Impala-CR](cdh5-trunk) IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
Yes, will add to test On Tue, Mar 29, 2016 at 3:55 PM, Matthew Jacobs wrote: > Can you make sure to parse the integer enum values as well as the string > names? > > On Tue, Mar 29, 2016 at 2:36 PM Skye Wanderman-Milne (Code Review) < > ger...@cloudera.org> wrote: > >> Skye Wanderman-Milne has uploaded a new patch set (#6). >> >> Change subject: IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION >> query option >> .. >> >> IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option >> >> This patch introduces a new query option, >> PARQUET_FALLBACK_SCHEMA_RESOLUTION which allows Parquet files' schemas >> to be resolved by either name or position. It's "fallback" because >> eventually field IDs will be the primary schema resolution scheme, and >> we don't want to create an option that we will have to change the name >> of later. The default is still by position. I chose to do a query >> option because it will make testing easier and also be easier to >> diagnose resolution problems quickly in the field. If users want to >> switch the default behavior to be by name (like Hive), they can use >> the --default_query_options flag. >> >> This patch also introduces a new test section, SHELL, which can be >> used to execute shell commands in a .test file. This is useful for >> copying files into test tables. >> >> Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 >> --- >> M be/src/exec/hdfs-parquet-scanner.cc >> M be/src/exec/hdfs-parquet-scanner.h >> M be/src/service/query-options.cc >> M be/src/service/query-options.h >> M common/thrift/ImpalaInternalService.thrift >> M common/thrift/ImpalaService.thrift >> A testdata/parquet_schema_resolution/README >> A testdata/parquet_schema_resolution/switched_map.avsc >> A testdata/parquet_schema_resolution/switched_map.json >> A testdata/parquet_schema_resolution/switched_map.parq >> A >> testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test >> M tests/common/impala_test_suite.py >> M tests/conftest.py >> M tests/query_test/test_scanners.py >> M tests/util/test_file_parser.py >> 15 files changed, 368 insertions(+), 18 deletions(-) >> >> >> git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/84/2384/6 >> -- >> To view, visit http://gerrit.cloudera.org:8080/2384 >> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings >> >> Gerrit-MessageType: newpatchset >> Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 >> Gerrit-PatchSet: 6 >> Gerrit-Project: Impala >> Gerrit-Branch: cdh5-trunk >> Gerrit-Owner: Skye Wanderman-Milne >> Gerrit-Reviewer: Dan Hecht >> Gerrit-Reviewer: Juan Yu >> Gerrit-Reviewer: Matthew Jacobs >> Gerrit-Reviewer: Michael Ho >> Gerrit-Reviewer: Silvius Rus >> Gerrit-Reviewer: Skye Wanderman-Milne >> >