[Impala-ASF-CR] DRAFT IMPALA-5973: Provide query plan in JSON format.

2018-03-16 Thread Pranay Singh (Code Review)
Pranay Singh has abandoned this change. ( http://gerrit.cloudera.org:8080/9614 )

Change subject: DRAFT IMPALA-5973: Provide  query plan in JSON format.
..


Abandoned

I'll post a draft change this was not meant for general review
--
To view, visit http://gerrit.cloudera.org:8080/9614
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I7429ac1e93a9c524233383cafd387056fcf2542d
Gerrit-Change-Number: 9614
Gerrit-PatchSet: 1
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Pranay Singh


[Impala-ASF-CR] DRAFT IMPALA-5973: Provide query plan in JSON format.

2018-03-14 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9614 )

Change subject: DRAFT IMPALA-5973: Provide  query plan in JSON format.
..


Patch Set 1:

I second Phil's concerns. The right flow here is to produce JSON and then use 
code to convert that JSON into the pretty-printed explain we have today.

Pranay, can you dump an example explain and JSON like you have here into a 
Google doc? It's easier to comment on specific parts like that.


--
To view, visit http://gerrit.cloudera.org:8080/9614
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7429ac1e93a9c524233383cafd387056fcf2542d
Gerrit-Change-Number: 9614
Gerrit-PatchSet: 1
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Pranay Singh
Gerrit-Comment-Date: Thu, 15 Mar 2018 04:51:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] DRAFT IMPALA-5973: Provide query plan in JSON format.

2018-03-14 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9614 )

Change subject: DRAFT IMPALA-5973: Provide  query plan in JSON format.
..


Patch Set 1:

Converting the runtime profile and the explain plan to JSON are two different 
things. The former is harder and much more contentious in the details. The 
intent of this JIRA was to address the explain plan to JSON part.


--
To view, visit http://gerrit.cloudera.org:8080/9614
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7429ac1e93a9c524233383cafd387056fcf2542d
Gerrit-Change-Number: 9614
Gerrit-PatchSet: 1
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Pranay Singh
Gerrit-Comment-Date: Thu, 15 Mar 2018 04:48:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] DRAFT IMPALA-5973: Provide query plan in JSON format.

2018-03-14 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9614 )

Change subject: DRAFT IMPALA-5973: Provide  query plan in JSON format.
..


Patch Set 1:

> into JSON format, so it can be displayed in different formats (portable)

I don't think this is machine-readable. We've got things like:

  "detailPrefix":"|  ",

and

  "totalBytes":"4.82MB",

and

"|  limit":100,

in there. Those are pretty fundamentally unfriendly, and they don't look like 
stable APIs. I can't intuit the original motivation here from the JIRA, but if 
I wanted to build a visualization of the plan in Javascript, or if I wanted to 
analyze many plans to look at patterns, I'd have a hard time doing this.

All the *Node* and *Sink* java files are a Java, object-based representation of 
the plan. One potential way forward is to serialize those to Thrift, so that we 
have a clear, consistent schema, and then serialize that to JSON. It may be 
that TPlanNode is re-usable here, or it may not be.


--
To view, visit http://gerrit.cloudera.org:8080/9614
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7429ac1e93a9c524233383cafd387056fcf2542d
Gerrit-Change-Number: 9614
Gerrit-PatchSet: 1
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Pranay Singh
Gerrit-Comment-Date: Wed, 14 Mar 2018 17:39:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] DRAFT IMPALA-5973: Provide query plan in JSON format.

2018-03-14 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9614 )

Change subject: DRAFT IMPALA-5973: Provide  query plan in JSON format.
..


Patch Set 1:

Maps -> JSON conversion is only for the "explain" part of the query profile 
since we don't have a schema for it (TExplainResult).  That being said, this 
patch seems to be missing the key part TRuntimeProfileTree -> JSON conversion 
IIUC.


--
To view, visit http://gerrit.cloudera.org:8080/9614
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7429ac1e93a9c524233383cafd387056fcf2542d
Gerrit-Change-Number: 9614
Gerrit-PatchSet: 1
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Pranay Singh
Gerrit-Comment-Date: Wed, 14 Mar 2018 17:29:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] DRAFT IMPALA-5973: Provide query plan in JSON format.

2018-03-14 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9614 )

Change subject: DRAFT IMPALA-5973: Provide  query plan in JSON format.
..


Patch Set 1:

> I’m excited to see this being worked on!
 >
 > Could you share some example serializations with us?
 >
 > If the goal is machine-readability, the approach of hand-crafting
 > JSON via maps seems fragile. I claim that if you want
 > machine-readability, you need to provide the schema of your output,
 > with a way of evolving that schema (e.g., a version number).
 >
 > How far do we get by serializing TPlanNode, the Thrift, to JSON?
 > i.e., if we think of the Thrift structure as the source of truth,
 > could we convert it to a usable JSON representation? If we use
 > Thrift, then we have a schema of sorts. Evolution via Thrift is
 > also better understood.

Hi Phil,

I'm not touching the Thrift part, these changes are to convert the explain
of a query plan that is current represented in textual format to convert it
into JSON format, so it can be displayed in different formats (portable)

I'm currently storing the the query plan as nested maps which gets converted 
into JSON format by using simpleJSON library calls

So a query plan that is displayed in textual format

 PLAN
PLAN-ROOT SINK
|
06:TOP-N [LIMIT=100]
|  order by: dt.d_year ASC, sum(ss_ext_sales_price) DESC, item.i_brand_id ASC
|
05:AGGREGATE [FINALIZE]
|  output: sum(ss_ext_sales_price)
|  group by: dt.d_year, item.i_brand, item.i_brand_id
|
04:HASH JOIN [INNER JOIN]
|  hash predicates: store_sales.ss_sold_date_sk = dt.d_date_sk
|  runtime filters: RF000 <- dt.d_date_sk
|
|--00:SCAN HDFS [tpcds.date_dim dt]
| partitions=1/1 files=1 size=9.84MB
| predicates: dt.d_moy = 12
|
03:HASH JOIN [INNER JOIN]
|  hash predicates: store_sales.ss_item_sk = item.i_item_sk
|  runtime filters: RF002 <- item.i_item_sk
|
|--02:SCAN HDFS [tpcds.item]
| partitions=1/1 files=1 size=4.82MB
| predicates: item.i_manufact_id = 436
|
01:SCAN HDFS [tpcds.store_sales]
   partitions=1824/1824 files=1824 size=326.32MB
   runtime filters: RF000 -> store_sales.ss_sold_date_sk, RF002 -> 
store_sales.ss_item_sk


will be displayed in JSON like this

{
  "PLANNER:":{
"PLAN FRAGMENT":{
  "DisplayLabelDetail":"UNPARTITIONED",
  "DATA SINK":{
"PLAN-ROOT SINK":{
  "prefix":"",
  "detailPrefix":"|  "
},
"prefix":"",
"detailPrefix":"|  "
  },
  "prefix":"",
  "Label":"MERGING-EXCHANGE",
  "order by":{
"item.i_brand_id":"ASC",
"dt.d_year":"ASC",
"sum(ss_ext_sales_price)":"DESC"
  },
  "|  limit":100,
  "detailPrefix":"|  ",
  "PLAN NODE LIST":[
{
  "prefix":"",
  "Label":"TOP-N",
  "order by":{
"item.i_brand_id":"ASC",
"dt.d_year":"ASC",
"sum(ss_ext_sales_price)":"DESC"
  },
  "PLAN NODE LIST":[ 
{
  "output":"sum:merge(ss_ext_sales_price)",
  "group by":"dt.d_year, item.i_brand, item.i_brand_id",
  "prefix":"",
  "Label":"AGGREGATE",
  "detailPrefix":"|  ",
  "PLAN NODE LIST":[
{ 
  
"DisplayLabelDetail":"HASH(dt.d_year,item.i_brand,item.i_brand_id)",
  "prefix":"",
  "Label":"EXCHANGE",
  "detailPrefix":"|  ",
  "PLAN NODE LIST":[
{
  "output":"sum(ss_ext_sales_price)",
  "group by":"dt.d_year, item.i_brand, item.i_brand_id",
  "prefix":"",
  "Label":"AGGREGATE",
  "detailPrefix":"|  ",
  "PLAN NODE LIST":[
{  
  "DisplayLabelDetail":"INNER JOIN, BROADCAST",
  "runtime filters":"RF000 <- dt.d_date_sk\n",
  "hash predicates":[
"store_sales.ss_sold_date_sk = dt.d_date_sk"
  ],
  "prefix":"",
  "Label":"HASH JOIN",
  "detailPrefix":"|  ",
  "PLAN NODE LIST":[
{
  "DisplayLabelDetail":"BROADCAST",
  "prefix":"|--",
  "Label":"EXCHANGE",
  "detailPrefix":"|  |  ",
  "PLAN NODE LIST":[
{
  "DisplayLabelDetail":"tpcds.date_dim dt",
  "predicates":"dt.d_moy = 12",
  "prefix":"|  ",
  "numPartitions":1,
  "Label":"SCAN HDFS",
 

[Impala-ASF-CR] DRAFT IMPALA-5973: Provide query plan in JSON format.

2018-03-14 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9614 )

Change subject: DRAFT IMPALA-5973: Provide  query plan in JSON format.
..


Patch Set 1:

I’m excited to see this being worked on!

Could you share some example serializations with us?

If the goal is machine-readability, the approach of hand-crafting JSON via maps 
seems fragile. I claim that if you want machine-readability, you need to 
provide the schema of your output, with a way of evolving that schema (e.g., a 
version number).

How far do we get by serializing TPlanNode, the Thrift, to JSON? i.e., if we 
think of the Thrift structure as the source of truth, could we convert it to a 
usable JSON representation? If we use Thrift, then we have a schema of sorts. 
Evolution via Thrift is also better understood.


--
To view, visit http://gerrit.cloudera.org:8080/9614
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7429ac1e93a9c524233383cafd387056fcf2542d
Gerrit-Change-Number: 9614
Gerrit-PatchSet: 1
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Pranay Singh
Gerrit-Comment-Date: Wed, 14 Mar 2018 16:56:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] DRAFT IMPALA-5973: Provide query plan in JSON format.

2018-03-14 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9614 )

Change subject: DRAFT IMPALA-5973: Provide  query plan in JSON format.
..


Patch Set 1:

> - High-level comment: It seems kind of redundant that we build the
 > explain string and Map at the same time for every getExplainString*().
 > Maybe we should have an intermediate state that we can convert to
 > both text-plan and JSON depending on the caller.  Seems like Map
 > would be the obvious choice (guava has helper methods to build the
 > explain string from Map) and obviously we can build a JSON string.
 >
 > - Please add a web endpoint for the json profile (something like
 > query_profile?json) to make it easy to consume and write e-e tests.
 >
 > Alex, does the approach sound ok, or do you think it can be done in
 > any better way?

Forgot to mention that String will be removed later, I have only added the new 
code and not removed the old code for verification purpose. Since this is the 
draft change I'll do the cleanup if the approach is fine.


--
To view, visit http://gerrit.cloudera.org:8080/9614
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7429ac1e93a9c524233383cafd387056fcf2542d
Gerrit-Change-Number: 9614
Gerrit-PatchSet: 1
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Pranay Singh
Gerrit-Comment-Date: Wed, 14 Mar 2018 16:35:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] DRAFT IMPALA-5973: Provide query plan in JSON format.

2018-03-14 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9614 )

Change subject: DRAFT IMPALA-5973: Provide  query plan in JSON format.
..


Patch Set 1:

- High-level comment: It seems kind of redundant that we build the explain 
string and Map at the same time for every getExplainString*(). Maybe we should 
have an intermediate state that we can convert to both text-plan and JSON 
depending on the caller.  Seems like Map would be the obvious choice (guava has 
helper methods to build the explain string from Map) and obviously we can build 
a JSON string.

- Please add a web endpoint for the json profile (something like 
query_profile?json) to make it easy to consume and write e-e tests.

Alex, does the approach sound ok, or do you think it can be done in any better 
way?


--
To view, visit http://gerrit.cloudera.org:8080/9614
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7429ac1e93a9c524233383cafd387056fcf2542d
Gerrit-Change-Number: 9614
Gerrit-PatchSet: 1
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Comment-Date: Wed, 14 Mar 2018 16:31:34 +
Gerrit-HasComments: No