[Impala-ASF-CR] DRAFT IMPALA-5973: Provide query plan in JSON format.
Pranay Singh has abandoned this change. ( http://gerrit.cloudera.org:8080/9614 ) Change subject: DRAFT IMPALA-5973: Provide query plan in JSON format. .. Abandoned I'll post a draft change this was not meant for general review -- To view, visit http://gerrit.cloudera.org:8080/9614 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I7429ac1e93a9c524233383cafd387056fcf2542d Gerrit-Change-Number: 9614 Gerrit-PatchSet: 1 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Pranay Singh
[Impala-ASF-CR] DRAFT IMPALA-5973: Provide query plan in JSON format.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9614 ) Change subject: DRAFT IMPALA-5973: Provide query plan in JSON format. .. Patch Set 1: I second Phil's concerns. The right flow here is to produce JSON and then use code to convert that JSON into the pretty-printed explain we have today. Pranay, can you dump an example explain and JSON like you have here into a Google doc? It's easier to comment on specific parts like that. -- To view, visit http://gerrit.cloudera.org:8080/9614 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7429ac1e93a9c524233383cafd387056fcf2542d Gerrit-Change-Number: 9614 Gerrit-PatchSet: 1 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Pranay Singh Gerrit-Comment-Date: Thu, 15 Mar 2018 04:51:52 + Gerrit-HasComments: No
[Impala-ASF-CR] DRAFT IMPALA-5973: Provide query plan in JSON format.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9614 ) Change subject: DRAFT IMPALA-5973: Provide query plan in JSON format. .. Patch Set 1: Converting the runtime profile and the explain plan to JSON are two different things. The former is harder and much more contentious in the details. The intent of this JIRA was to address the explain plan to JSON part. -- To view, visit http://gerrit.cloudera.org:8080/9614 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7429ac1e93a9c524233383cafd387056fcf2542d Gerrit-Change-Number: 9614 Gerrit-PatchSet: 1 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Pranay Singh Gerrit-Comment-Date: Thu, 15 Mar 2018 04:48:05 + Gerrit-HasComments: No
[Impala-ASF-CR] DRAFT IMPALA-5973: Provide query plan in JSON format.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/9614 ) Change subject: DRAFT IMPALA-5973: Provide query plan in JSON format. .. Patch Set 1: > into JSON format, so it can be displayed in different formats (portable) I don't think this is machine-readable. We've got things like: "detailPrefix":"| ", and "totalBytes":"4.82MB", and "| limit":100, in there. Those are pretty fundamentally unfriendly, and they don't look like stable APIs. I can't intuit the original motivation here from the JIRA, but if I wanted to build a visualization of the plan in Javascript, or if I wanted to analyze many plans to look at patterns, I'd have a hard time doing this. All the *Node* and *Sink* java files are a Java, object-based representation of the plan. One potential way forward is to serialize those to Thrift, so that we have a clear, consistent schema, and then serialize that to JSON. It may be that TPlanNode is re-usable here, or it may not be. -- To view, visit http://gerrit.cloudera.org:8080/9614 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7429ac1e93a9c524233383cafd387056fcf2542d Gerrit-Change-Number: 9614 Gerrit-PatchSet: 1 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bharath VissapragadaGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Pranay Singh Gerrit-Comment-Date: Wed, 14 Mar 2018 17:39:15 + Gerrit-HasComments: No
[Impala-ASF-CR] DRAFT IMPALA-5973: Provide query plan in JSON format.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/9614 ) Change subject: DRAFT IMPALA-5973: Provide query plan in JSON format. .. Patch Set 1: Maps -> JSON conversion is only for the "explain" part of the query profile since we don't have a schema for it (TExplainResult). That being said, this patch seems to be missing the key part TRuntimeProfileTree -> JSON conversion IIUC. -- To view, visit http://gerrit.cloudera.org:8080/9614 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7429ac1e93a9c524233383cafd387056fcf2542d Gerrit-Change-Number: 9614 Gerrit-PatchSet: 1 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bharath VissapragadaGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Pranay Singh Gerrit-Comment-Date: Wed, 14 Mar 2018 17:29:16 + Gerrit-HasComments: No
[Impala-ASF-CR] DRAFT IMPALA-5973: Provide query plan in JSON format.
Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/9614 ) Change subject: DRAFT IMPALA-5973: Provide query plan in JSON format. .. Patch Set 1: > I’m excited to see this being worked on! > > Could you share some example serializations with us? > > If the goal is machine-readability, the approach of hand-crafting > JSON via maps seems fragile. I claim that if you want > machine-readability, you need to provide the schema of your output, > with a way of evolving that schema (e.g., a version number). > > How far do we get by serializing TPlanNode, the Thrift, to JSON? > i.e., if we think of the Thrift structure as the source of truth, > could we convert it to a usable JSON representation? If we use > Thrift, then we have a schema of sorts. Evolution via Thrift is > also better understood. Hi Phil, I'm not touching the Thrift part, these changes are to convert the explain of a query plan that is current represented in textual format to convert it into JSON format, so it can be displayed in different formats (portable) I'm currently storing the the query plan as nested maps which gets converted into JSON format by using simpleJSON library calls So a query plan that is displayed in textual format PLAN PLAN-ROOT SINK | 06:TOP-N [LIMIT=100] | order by: dt.d_year ASC, sum(ss_ext_sales_price) DESC, item.i_brand_id ASC | 05:AGGREGATE [FINALIZE] | output: sum(ss_ext_sales_price) | group by: dt.d_year, item.i_brand, item.i_brand_id | 04:HASH JOIN [INNER JOIN] | hash predicates: store_sales.ss_sold_date_sk = dt.d_date_sk | runtime filters: RF000 <- dt.d_date_sk | |--00:SCAN HDFS [tpcds.date_dim dt] | partitions=1/1 files=1 size=9.84MB | predicates: dt.d_moy = 12 | 03:HASH JOIN [INNER JOIN] | hash predicates: store_sales.ss_item_sk = item.i_item_sk | runtime filters: RF002 <- item.i_item_sk | |--02:SCAN HDFS [tpcds.item] | partitions=1/1 files=1 size=4.82MB | predicates: item.i_manufact_id = 436 | 01:SCAN HDFS [tpcds.store_sales] partitions=1824/1824 files=1824 size=326.32MB runtime filters: RF000 -> store_sales.ss_sold_date_sk, RF002 -> store_sales.ss_item_sk will be displayed in JSON like this { "PLANNER:":{ "PLAN FRAGMENT":{ "DisplayLabelDetail":"UNPARTITIONED", "DATA SINK":{ "PLAN-ROOT SINK":{ "prefix":"", "detailPrefix":"| " }, "prefix":"", "detailPrefix":"| " }, "prefix":"", "Label":"MERGING-EXCHANGE", "order by":{ "item.i_brand_id":"ASC", "dt.d_year":"ASC", "sum(ss_ext_sales_price)":"DESC" }, "| limit":100, "detailPrefix":"| ", "PLAN NODE LIST":[ { "prefix":"", "Label":"TOP-N", "order by":{ "item.i_brand_id":"ASC", "dt.d_year":"ASC", "sum(ss_ext_sales_price)":"DESC" }, "PLAN NODE LIST":[ { "output":"sum:merge(ss_ext_sales_price)", "group by":"dt.d_year, item.i_brand, item.i_brand_id", "prefix":"", "Label":"AGGREGATE", "detailPrefix":"| ", "PLAN NODE LIST":[ { "DisplayLabelDetail":"HASH(dt.d_year,item.i_brand,item.i_brand_id)", "prefix":"", "Label":"EXCHANGE", "detailPrefix":"| ", "PLAN NODE LIST":[ { "output":"sum(ss_ext_sales_price)", "group by":"dt.d_year, item.i_brand, item.i_brand_id", "prefix":"", "Label":"AGGREGATE", "detailPrefix":"| ", "PLAN NODE LIST":[ { "DisplayLabelDetail":"INNER JOIN, BROADCAST", "runtime filters":"RF000 <- dt.d_date_sk\n", "hash predicates":[ "store_sales.ss_sold_date_sk = dt.d_date_sk" ], "prefix":"", "Label":"HASH JOIN", "detailPrefix":"| ", "PLAN NODE LIST":[ { "DisplayLabelDetail":"BROADCAST", "prefix":"|--", "Label":"EXCHANGE", "detailPrefix":"| | ", "PLAN NODE LIST":[ { "DisplayLabelDetail":"tpcds.date_dim dt", "predicates":"dt.d_moy = 12", "prefix":"| ", "numPartitions":1, "Label":"SCAN HDFS",
[Impala-ASF-CR] DRAFT IMPALA-5973: Provide query plan in JSON format.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/9614 ) Change subject: DRAFT IMPALA-5973: Provide query plan in JSON format. .. Patch Set 1: I’m excited to see this being worked on! Could you share some example serializations with us? If the goal is machine-readability, the approach of hand-crafting JSON via maps seems fragile. I claim that if you want machine-readability, you need to provide the schema of your output, with a way of evolving that schema (e.g., a version number). How far do we get by serializing TPlanNode, the Thrift, to JSON? i.e., if we think of the Thrift structure as the source of truth, could we convert it to a usable JSON representation? If we use Thrift, then we have a schema of sorts. Evolution via Thrift is also better understood. -- To view, visit http://gerrit.cloudera.org:8080/9614 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7429ac1e93a9c524233383cafd387056fcf2542d Gerrit-Change-Number: 9614 Gerrit-PatchSet: 1 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bharath VissapragadaGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Pranay Singh Gerrit-Comment-Date: Wed, 14 Mar 2018 16:56:13 + Gerrit-HasComments: No
[Impala-ASF-CR] DRAFT IMPALA-5973: Provide query plan in JSON format.
Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/9614 ) Change subject: DRAFT IMPALA-5973: Provide query plan in JSON format. .. Patch Set 1: > - High-level comment: It seems kind of redundant that we build the > explain string and Map at the same time for every getExplainString*(). > Maybe we should have an intermediate state that we can convert to > both text-plan and JSON depending on the caller. Seems like Map > would be the obvious choice (guava has helper methods to build the > explain string from Map) and obviously we can build a JSON string. > > - Please add a web endpoint for the json profile (something like > query_profile?json) to make it easy to consume and write e-e tests. > > Alex, does the approach sound ok, or do you think it can be done in > any better way? Forgot to mention that String will be removed later, I have only added the new code and not removed the old code for verification purpose. Since this is the draft change I'll do the cleanup if the approach is fine. -- To view, visit http://gerrit.cloudera.org:8080/9614 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7429ac1e93a9c524233383cafd387056fcf2542d Gerrit-Change-Number: 9614 Gerrit-PatchSet: 1 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bharath VissapragadaGerrit-Reviewer: Pranay Singh Gerrit-Comment-Date: Wed, 14 Mar 2018 16:35:51 + Gerrit-HasComments: No
[Impala-ASF-CR] DRAFT IMPALA-5973: Provide query plan in JSON format.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/9614 ) Change subject: DRAFT IMPALA-5973: Provide query plan in JSON format. .. Patch Set 1: - High-level comment: It seems kind of redundant that we build the explain string and Map at the same time for every getExplainString*(). Maybe we should have an intermediate state that we can convert to both text-plan and JSON depending on the caller. Seems like Map would be the obvious choice (guava has helper methods to build the explain string from Map) and obviously we can build a JSON string. - Please add a web endpoint for the json profile (something like query_profile?json) to make it easy to consume and write e-e tests. Alex, does the approach sound ok, or do you think it can be done in any better way? -- To view, visit http://gerrit.cloudera.org:8080/9614 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7429ac1e93a9c524233383cafd387056fcf2542d Gerrit-Change-Number: 9614 Gerrit-PatchSet: 1 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bharath VissapragadaGerrit-Comment-Date: Wed, 14 Mar 2018 16:31:34 + Gerrit-HasComments: No