Re: Strange metadata from Text Reader

2019-06-24 Thread Paul Rogers
Hi All,

To close the loop on this, see the detailed comments in DRILL-7308 which 
Charles kindly filed. There is a code bug in the REST metadata feature itself 
which causes the schema to repeat for every returned record batch, and which 
causes it to display precision and scale for VARCHAR columns. Should be easy to 
fix.

Thanks,
- Paul

 

On Monday, June 24, 2019, 12:21:10 PM PDT, Arina Yelchiyeva 
 wrote:  
 
 It would be good to help to identify the commit that actually caused the bug.
Personally, I don’t recall anything that might have broken this functionality.

Kind regards,
Arina

> On Jun 24, 2019, at 10:19 PM, Charles Givre  wrote:
> 
> I don't have that version of Drill anymore but this feature worked correctly 
> until recently.  I'm using the latest build of Drill. 
> 
>> On Jun 24, 2019, at 3:18 PM, Arina Yelchiyeva  
>> wrote:
>> 
>> Just to confirm, in Drill 1.15 it works correctly?
>> 
>> Kind regards,
>> Arina
>> 
>>> On Jun 24, 2019, at 10:15 PM, Charles Givre  wrote:
>>> 
>>> Hi Arina, 
>>> It doesn't seem to make a difference unfortunately. :-(
>>> --C 
>>> 
 On Jun 24, 2019, at 3:09 PM, Arina Yelchiyeva  
 wrote:
 
 Hi Charles,
 
 Please try with v3 reader enabled: set 
 `exec.storage.enable_v3_text_reader` = true.
 Does it behave the same?
 
 Kind regards,
 Arina
 
> On Jun 24, 2019, at 9:38 PM, Charles Givre  wrote:
> 
> Hello Drill Devs,
> I'm noticing some strange behavior with the newest version of Drill.  If 
> you query a CSV file, you get the following metadata:
> 
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> 
> {
> "queryId": "22eee85f-c02c-5878-9735-091d18788061",
> "columns": [
> "domain"
> ],
> "rows": [
> {
>  "domain": "thedataist.com"
> }
> ],
> "metadata": [
> "VARCHAR(0, 0)",
> "VARCHAR(0, 0)"
> ],
> "queryState": "COMPLETED",
> "attemptedAutoLimit": 0
> }
> 
> 
> There are two issues here:
> 1.  VARCHAR now has precision 
> 2.  There are twice as many columns as there should be.
> 
> Additionally, if you query a regular CSV, without the columns extracted, 
> you get the following:
> 
> "rows": [
> {
>  "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"
> }
> ],
> "metadata": [
> "VARCHAR(0, 0)",
> "VARCHAR(0, 0)"
> ],
> 
> This is bizarre in that the data type is not being reported correctly, it 
> should be LIST or something like that, AND we're getting too many columns 
> in the metadata.  I'll submit a JIRA as well, but could someone please 
> take a look?
> Thanks,
> -- C
> 
> 
> 
 
>>> 
>> 
> 
  

[GitHub] [drill] paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework

2019-06-24 Thread GitBox
paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-505286184
 
 
   @arina-ielchiieva, regarding `enableSchemaBatch`, recall that Java boolean 
variables are, by definition in the language spec, set to false. So, since we 
never set it to true, except in tests, it always defaults to false.
   
   Since this was confusing, added Javadoc to explain the problem and the 
default setting of the option. Does this new material answer your question?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on issue #1809: DRILL-6951: Row set based mock data source

2019-06-24 Thread GitBox
paul-rogers commented on issue #1809: DRILL-6951: Row set based mock data source
URL: https://github.com/apache/drill/pull/1809#issuecomment-505257599
 
 
   Squashed commits and rebased on latest master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: Strange metadata from Text Reader

2019-06-24 Thread Paul Rogers
Hi Charles,

Latest master? Please file a JIRA with repo steps. I’ll take a look.

- Paul

Sent from my iPhone

> On Jun 24, 2019, at 11:38 AM, Charles Givre  wrote:
> 
> Hello Drill Devs,
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> 
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> 
> {
>  "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>  "columns": [
>"domain"
>  ],
>  "rows": [
>{
>  "domain": "thedataist.com"
>}
>  ],
>  "metadata": [
>"VARCHAR(0, 0)",
>"VARCHAR(0, 0)"
>  ],
>  "queryState": "COMPLETED",
>  "attemptedAutoLimit": 0
> }
> 
> 
> There are two issues here:
> 1.  VARCHAR now has precision 
> 2.  There are twice as many columns as there should be.
> 
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> 
> "rows": [
>{
>  "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"
>}
>  ],
>  "metadata": [
>"VARCHAR(0, 0)",
>"VARCHAR(0, 0)"
>  ],
> 
> This is bizarre in that the data type is not being reported correctly, it 
> should be LIST or something like that, AND we're getting too many columns in 
> the metadata.  I'll submit a JIRA as well, but could someone please take a 
> look?
> Thanks,
> -- C
> 
> 
> 



Re: Multi char csv delimiter

2019-06-24 Thread Khurram Faraaz
Hi Matthias,

Like Paul mentioned, that information with examples can be found here.
Our documentation mentions that, "As of Drill 1.8, Drill supports
multi-byte delimiters, such as \r\n. "
https://drill.apache.org/docs/plugin-configuration-basics/

Thanks,
Khurram

On Mon, Jun 24, 2019 at 2:31 PM Paul Rogers 
wrote:

> Hi Matthias,
>
> Field delimiters, quotes and quote escapes can be only one character. The
> line delimiter can be multi.
>
> Are you setting the line delimiter?
>
> - Paul
>
> Sent from my iPhone
>
> > On Jun 24, 2019, at 12:10 PM, Arina Yelchiyeva <
> arina.yelchiy...@gmail.com> wrote:
> >
> > Hi Matthias,
> >
> > Attachments are not supported on the mailing list, please include text
> describing your configuration.
> >
> > Kind regards,
> > Arina
> >
> >> On Jun 24, 2019, at 2:21 PM, Rosenthaler Matthias (PS-DI/ETF1.1)
>  wrote:
> >>
> >> Hi,
> >>
> >> It seems that multi char delimiter “\n\r” is not supported for csv
> format drill 1.16.
> >> The documentation mentions it should work, but it does not work for me.
> It always says “invalid JSON syntax” if I try to change the storage plugin
> configuration.
> >>
> >>
> >>
> >> Mit freundlichen Grüßen / Best regards
> >>
> >> Matthias Rosenthaler
> >>
> >> Powertrain Solutions, Engine Testing (PS-DI/ETF1.1)
> >> Robert Bosch AG | Robert-Bosch-Straße 1 | 4020 Linz | AUSTRIA |
> www.bosch.at 
> >> Tel. +43 732 7667-479 | matthias.rosentha...@at.bosch.com  matthias.rosentha...@at.bosch.com>
> >>
> >> Sitz: Robert Bosch Aktiengesellschaft, A-1030 Wien, Göllnergasse 15-17
> , Registergericht: FN 55722 w HG-Wien
> >> Aufsichtsratsvorsitzender: Dr. Uwe Thomas; Geschäftsführung: Dr. Klaus
> Peter Fouquet
> >> DVR-Nr.: 0418871- ARA-Lizenz-Nr.: 1831 - UID-Nr.: ATU14719303 -
> Steuernummer 140/4988
> >
>
>


Re: Multi char csv delimiter

2019-06-24 Thread Paul Rogers
Hi Matthias,

Field delimiters, quotes and quote escapes can be only one character. The line 
delimiter can be multi.

Are you setting the line delimiter?

- Paul

Sent from my iPhone

> On Jun 24, 2019, at 12:10 PM, Arina Yelchiyeva  
> wrote:
> 
> Hi Matthias,
> 
> Attachments are not supported on the mailing list, please include text 
> describing your configuration.
> 
> Kind regards,
> Arina
> 
>> On Jun 24, 2019, at 2:21 PM, Rosenthaler Matthias (PS-DI/ETF1.1) 
>>  wrote:
>> 
>> Hi,
>> 
>> It seems that multi char delimiter “\n\r” is not supported for csv format 
>> drill 1.16.
>> The documentation mentions it should work, but it does not work for me. It 
>> always says “invalid JSON syntax” if I try to change the storage plugin 
>> configuration.
>> 
>> 
>> 
>> Mit freundlichen Grüßen / Best regards 
>> 
>> Matthias Rosenthaler
>> 
>> Powertrain Solutions, Engine Testing (PS-DI/ETF1.1) 
>> Robert Bosch AG | Robert-Bosch-Straße 1 | 4020 Linz | AUSTRIA | www.bosch.at 
>>  
>> Tel. +43 732 7667-479 | matthias.rosentha...@at.bosch.com 
>>  
>> 
>> Sitz: Robert Bosch Aktiengesellschaft, A-1030 Wien, Göllnergasse 15-17 , 
>> Registergericht: FN 55722 w HG-Wien
>> Aufsichtsratsvorsitzender: Dr. Uwe Thomas; Geschäftsführung: Dr. Klaus Peter 
>> Fouquet
>> DVR-Nr.: 0418871- ARA-Lizenz-Nr.: 1831 - UID-Nr.: ATU14719303 - Steuernummer 
>> 140/4988
> 



Re: Strange metadata from Text Reader

2019-06-24 Thread Arina Yelchiyeva
It would be good to help to identify the commit that actually caused the bug.
Personally, I don’t recall anything that might have broken this functionality.

Kind regards,
Arina

> On Jun 24, 2019, at 10:19 PM, Charles Givre  wrote:
> 
> I don't have that version of Drill anymore but this feature worked correctly 
> until recently.  I'm using the latest build of Drill. 
> 
>> On Jun 24, 2019, at 3:18 PM, Arina Yelchiyeva  
>> wrote:
>> 
>> Just to confirm, in Drill 1.15 it works correctly?
>> 
>> Kind regards,
>> Arina
>> 
>>> On Jun 24, 2019, at 10:15 PM, Charles Givre  wrote:
>>> 
>>> Hi Arina, 
>>> It doesn't seem to make a difference unfortunately. :-(
>>> --C 
>>> 
 On Jun 24, 2019, at 3:09 PM, Arina Yelchiyeva  
 wrote:
 
 Hi Charles,
 
 Please try with v3 reader enabled: set 
 `exec.storage.enable_v3_text_reader` = true.
 Does it behave the same?
 
 Kind regards,
 Arina
 
> On Jun 24, 2019, at 9:38 PM, Charles Givre  wrote:
> 
> Hello Drill Devs,
> I'm noticing some strange behavior with the newest version of Drill.  If 
> you query a CSV file, you get the following metadata:
> 
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> 
> {
> "queryId": "22eee85f-c02c-5878-9735-091d18788061",
> "columns": [
> "domain"
> ],
> "rows": [
> {
>  "domain": "thedataist.com"
> }
> ],
> "metadata": [
> "VARCHAR(0, 0)",
> "VARCHAR(0, 0)"
> ],
> "queryState": "COMPLETED",
> "attemptedAutoLimit": 0
> }
> 
> 
> There are two issues here:
> 1.  VARCHAR now has precision 
> 2.  There are twice as many columns as there should be.
> 
> Additionally, if you query a regular CSV, without the columns extracted, 
> you get the following:
> 
> "rows": [
> {
>  "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"
> }
> ],
> "metadata": [
> "VARCHAR(0, 0)",
> "VARCHAR(0, 0)"
> ],
> 
> This is bizarre in that the data type is not being reported correctly, it 
> should be LIST or something like that, AND we're getting too many columns 
> in the metadata.  I'll submit a JIRA as well, but could someone please 
> take a look?
> Thanks,
> -- C
> 
> 
> 
 
>>> 
>> 
> 



Re: Strange metadata from Text Reader

2019-06-24 Thread Charles Givre
I don't have that version of Drill anymore but this feature worked correctly 
until recently.  I'm using the latest build of Drill. 

> On Jun 24, 2019, at 3:18 PM, Arina Yelchiyeva  
> wrote:
> 
> Just to confirm, in Drill 1.15 it works correctly?
> 
> Kind regards,
> Arina
> 
>> On Jun 24, 2019, at 10:15 PM, Charles Givre  wrote:
>> 
>> Hi Arina, 
>> It doesn't seem to make a difference unfortunately. :-(
>> --C 
>> 
>>> On Jun 24, 2019, at 3:09 PM, Arina Yelchiyeva  
>>> wrote:
>>> 
>>> Hi Charles,
>>> 
>>> Please try with v3 reader enabled: set `exec.storage.enable_v3_text_reader` 
>>> = true.
>>> Does it behave the same?
>>> 
>>> Kind regards,
>>> Arina
>>> 
 On Jun 24, 2019, at 9:38 PM, Charles Givre  wrote:
 
 Hello Drill Devs,
 I'm noticing some strange behavior with the newest version of Drill.  If 
 you query a CSV file, you get the following metadata:
 
 SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
 
 {
 "queryId": "22eee85f-c02c-5878-9735-091d18788061",
 "columns": [
 "domain"
 ],
 "rows": [
 {
   "domain": "thedataist.com"
 }
 ],
 "metadata": [
 "VARCHAR(0, 0)",
 "VARCHAR(0, 0)"
 ],
 "queryState": "COMPLETED",
 "attemptedAutoLimit": 0
 }
 
 
 There are two issues here:
 1.  VARCHAR now has precision 
 2.  There are twice as many columns as there should be.
 
 Additionally, if you query a regular CSV, without the columns extracted, 
 you get the following:
 
 "rows": [
 {
   "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"
 }
 ],
 "metadata": [
 "VARCHAR(0, 0)",
 "VARCHAR(0, 0)"
 ],
 
 This is bizarre in that the data type is not being reported correctly, it 
 should be LIST or something like that, AND we're getting too many columns 
 in the metadata.  I'll submit a JIRA as well, but could someone please 
 take a look?
 Thanks,
 -- C
 
 
 
>>> 
>> 
> 



Re: Strange metadata from Text Reader

2019-06-24 Thread Arina Yelchiyeva
Just to confirm, in Drill 1.15 it works correctly?

Kind regards,
Arina

> On Jun 24, 2019, at 10:15 PM, Charles Givre  wrote:
> 
> Hi Arina, 
> It doesn't seem to make a difference unfortunately. :-(
> --C 
> 
>> On Jun 24, 2019, at 3:09 PM, Arina Yelchiyeva  
>> wrote:
>> 
>> Hi Charles,
>> 
>> Please try with v3 reader enabled: set `exec.storage.enable_v3_text_reader` 
>> = true.
>> Does it behave the same?
>> 
>> Kind regards,
>> Arina
>> 
>>> On Jun 24, 2019, at 9:38 PM, Charles Givre  wrote:
>>> 
>>> Hello Drill Devs,
>>> I'm noticing some strange behavior with the newest version of Drill.  If 
>>> you query a CSV file, you get the following metadata:
>>> 
>>> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
>>> 
>>> {
>>> "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>>> "columns": [
>>>  "domain"
>>> ],
>>> "rows": [
>>>  {
>>>"domain": "thedataist.com"
>>>  }
>>> ],
>>> "metadata": [
>>>  "VARCHAR(0, 0)",
>>>  "VARCHAR(0, 0)"
>>> ],
>>> "queryState": "COMPLETED",
>>> "attemptedAutoLimit": 0
>>> }
>>> 
>>> 
>>> There are two issues here:
>>> 1.  VARCHAR now has precision 
>>> 2.  There are twice as many columns as there should be.
>>> 
>>> Additionally, if you query a regular CSV, without the columns extracted, 
>>> you get the following:
>>> 
>>> "rows": [
>>>  {
>>>"columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"
>>>  }
>>> ],
>>> "metadata": [
>>>  "VARCHAR(0, 0)",
>>>  "VARCHAR(0, 0)"
>>> ],
>>> 
>>> This is bizarre in that the data type is not being reported correctly, it 
>>> should be LIST or something like that, AND we're getting too many columns 
>>> in the metadata.  I'll submit a JIRA as well, but could someone please take 
>>> a look?
>>> Thanks,
>>> -- C
>>> 
>>> 
>>> 
>> 
> 



Re: Strange metadata from Text Reader

2019-06-24 Thread Charles Givre
Hi Arina, 
It doesn't seem to make a difference unfortunately. :-(
--C 

> On Jun 24, 2019, at 3:09 PM, Arina Yelchiyeva  
> wrote:
> 
> Hi Charles,
> 
> Please try with v3 reader enabled: set `exec.storage.enable_v3_text_reader` = 
> true.
> Does it behave the same?
> 
> Kind regards,
> Arina
> 
>> On Jun 24, 2019, at 9:38 PM, Charles Givre  wrote:
>> 
>> Hello Drill Devs,
>> I'm noticing some strange behavior with the newest version of Drill.  If you 
>> query a CSV file, you get the following metadata:
>> 
>> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
>> 
>> {
>> "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>> "columns": [
>>   "domain"
>> ],
>> "rows": [
>>   {
>> "domain": "thedataist.com"
>>   }
>> ],
>> "metadata": [
>>   "VARCHAR(0, 0)",
>>   "VARCHAR(0, 0)"
>> ],
>> "queryState": "COMPLETED",
>> "attemptedAutoLimit": 0
>> }
>> 
>> 
>> There are two issues here:
>> 1.  VARCHAR now has precision 
>> 2.  There are twice as many columns as there should be.
>> 
>> Additionally, if you query a regular CSV, without the columns extracted, you 
>> get the following:
>> 
>> "rows": [
>>   {
>> "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"
>>   }
>> ],
>> "metadata": [
>>   "VARCHAR(0, 0)",
>>   "VARCHAR(0, 0)"
>> ],
>> 
>> This is bizarre in that the data type is not being reported correctly, it 
>> should be LIST or something like that, AND we're getting too many columns in 
>> the metadata.  I'll submit a JIRA as well, but could someone please take a 
>> look?
>> Thanks,
>> -- C
>> 
>> 
>> 
> 



Re: Multi char csv delimiter

2019-06-24 Thread Arina Yelchiyeva
Hi Matthias,

Attachments are not supported on the mailing list, please include text 
describing your configuration.

Kind regards,
Arina

> On Jun 24, 2019, at 2:21 PM, Rosenthaler Matthias (PS-DI/ETF1.1) 
>  wrote:
> 
> Hi,
>  
> It seems that multi char delimiter “\n\r” is not supported for csv format 
> drill 1.16.
> The documentation mentions it should work, but it does not work for me. It 
> always says “invalid JSON syntax” if I try to change the storage plugin 
> configuration.
>  
> 
>  
> Mit freundlichen Grüßen / Best regards 
> 
> Matthias Rosenthaler
> 
> Powertrain Solutions, Engine Testing (PS-DI/ETF1.1) 
> Robert Bosch AG | Robert-Bosch-Straße 1 | 4020 Linz | AUSTRIA | www.bosch.at 
>  
> Tel. +43 732 7667-479 | matthias.rosentha...@at.bosch.com 
>  
> 
> Sitz: Robert Bosch Aktiengesellschaft, A-1030 Wien, Göllnergasse 15-17 , 
> Registergericht: FN 55722 w HG-Wien
> Aufsichtsratsvorsitzender: Dr. Uwe Thomas; Geschäftsführung: Dr. Klaus Peter 
> Fouquet
> DVR-Nr.: 0418871- ARA-Lizenz-Nr.: 1831 - UID-Nr.: ATU14719303 - Steuernummer 
> 140/4988



Re: Strange metadata from Text Reader

2019-06-24 Thread Arina Yelchiyeva
Hi Charles,

Please try with v3 reader enabled: set `exec.storage.enable_v3_text_reader` = 
true.
Does it behave the same?

Kind regards,
Arina

> On Jun 24, 2019, at 9:38 PM, Charles Givre  wrote:
> 
> Hello Drill Devs,
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> 
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> 
> {
>  "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>  "columns": [
>"domain"
>  ],
>  "rows": [
>{
>  "domain": "thedataist.com"
>}
>  ],
>  "metadata": [
>"VARCHAR(0, 0)",
>"VARCHAR(0, 0)"
>  ],
>  "queryState": "COMPLETED",
>  "attemptedAutoLimit": 0
> }
> 
> 
> There are two issues here:
> 1.  VARCHAR now has precision 
> 2.  There are twice as many columns as there should be.
> 
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> 
> "rows": [
>{
>  "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"
>}
>  ],
>  "metadata": [
>"VARCHAR(0, 0)",
>"VARCHAR(0, 0)"
>  ],
> 
> This is bizarre in that the data type is not being reported correctly, it 
> should be LIST or something like that, AND we're getting too many columns in 
> the metadata.  I'll submit a JIRA as well, but could someone please take a 
> look?
> Thanks,
> -- C
> 
> 
> 



Multi char csv delimiter

2019-06-24 Thread Rosenthaler Matthias (PS-DI/ETF1.1)
Hi,

It seems that multi char delimiter "\n\r" is not supported for csv format drill 
1.16.
The documentation mentions it should work, but it does not work for me. It 
always says "invalid JSON syntax" if I try to change the storage plugin 
configuration.

[cid:image001.png@01D52A8F.B9475F50]

Mit freundlichen Grüßen / Best regards

Matthias Rosenthaler

Powertrain Solutions, Engine Testing (PS-DI/ETF1.1)
Robert Bosch AG | Robert-Bosch-Straße 1 | 4020 Linz | AUSTRIA | 
www.bosch.at
Tel. +43 732 7667-479 | 
matthias.rosentha...@at.bosch.com

Sitz: Robert Bosch Aktiengesellschaft, A-1030 Wien, Göllnergasse 15-17 , 
Registergericht: FN 55722 w HG-Wien
Aufsichtsratsvorsitzender: Dr. Uwe Thomas; Geschäftsführung: Dr. Klaus Peter 
Fouquet
DVR-Nr.: 0418871- ARA-Lizenz-Nr.: 1831 - UID-Nr.: ATU14719303 - Steuernummer 
140/4988




[jira] [Created] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-24 Thread Charles Givre (JIRA)
Charles Givre created DRILL-7308:


 Summary: Incorrect Metadata from text file queries
 Key: DRILL-7308
 URL: https://issues.apache.org/jira/browse/DRILL-7308
 Project: Apache Drill
  Issue Type: Bug
  Components: Metadata
Affects Versions: 1.17.0
Reporter: Charles Givre
 Attachments: domains.csvh

I'm noticing some strange behavior with the newest version of Drill.  If you 
query a CSV file, you get the following metadata:
 
SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
 
{
  "queryId": "22eee85f-c02c-5878-9735-091d18788061",
  "columns": [
    "domain"
  ],
  "rows": [
    {
      "domain": "thedataist.com"
    }
  ],
  "metadata": [
    "VARCHAR(0, 0)",
    "VARCHAR(0, 0)"
  ],
  "queryState": "COMPLETED",
  "attemptedAutoLimit": 0
}
 
 
There are two issues here:
1.  VARCHAR now has precision 
2.  There are twice as many columns as there should be.
 
Additionally, if you query a regular CSV, without the columns extracted, you 
get the following:
 
"rows": [
    {
      "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"
    }
  ],
  "metadata": [
    "VARCHAR(0, 0)",
    "VARCHAR(0, 0)"
  ],
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Strange metadata from Text Reader

2019-06-24 Thread Charles Givre
Hello Drill Devs,
I'm noticing some strange behavior with the newest version of Drill.  If you 
query a CSV file, you get the following metadata:

SELECT * FROM dfs.test.`domains.csvh` LIMIT 1

{
  "queryId": "22eee85f-c02c-5878-9735-091d18788061",
  "columns": [
"domain"
  ],
  "rows": [
{
  "domain": "thedataist.com"
}
  ],
  "metadata": [
"VARCHAR(0, 0)",
"VARCHAR(0, 0)"
  ],
  "queryState": "COMPLETED",
  "attemptedAutoLimit": 0
}


There are two issues here:
1.  VARCHAR now has precision 
2.  There are twice as many columns as there should be.

Additionally, if you query a regular CSV, without the columns extracted, you 
get the following:

"rows": [
{
  "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"
}
  ],
  "metadata": [
"VARCHAR(0, 0)",
"VARCHAR(0, 0)"
  ],

This is bizarre in that the data type is not being reported correctly, it 
should be LIST or something like that, AND we're getting too many columns in 
the metadata.  I'll submit a JIRA as well, but could someone please take a look?
Thanks,
-- C





Apache Drill Hangout - June 25, 2019

2019-06-24 Thread Boaz Ben-Zvi

Hi Drillers,

  Our bi-weekly hangout is scheduled for tomorrow, Tuesday, June 25th, at 10 AM PST 
(link:https://meet.google.com/yki-iqdf-tai  
).

Please suggest any topics you would like to discuss during the hangout by 
replying to this email.

   Thanks,

 Boaz
 



[GitHub] [drill] arina-ielchiieva commented on issue #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on issue #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#issuecomment-505076823
 
 
   @vvysotskyi thanks for making the changes. +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296770734
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataInfo.java
 ##
 @@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Class which identifies specific metadata.
 
 Review comment:
   Thanks, done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296787766
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -281,31 +265,54 @@ public AbstractGroupScanWithMetadata 
applyFilter(LogicalExpression filterExpr, U
 
   logger.debug("All row groups have been filtered out. Add back one to get 
schema from scanner");
 
+  Map segmentsMap = 
getNextOrEmpty(getSegmentsMetadata().values()).stream()
+  .collect(Collectors.toMap(SegmentMetadata::getPath, 
Function.identity()));
+
   Map filesMap = 
getNextOrEmpty(getFilesMetadata().values()).stream()
-  .collect(Collectors.toMap(FileMetadata::getLocation, 
Function.identity()));
+  .collect(Collectors.toMap(FileMetadata::getPath, 
Function.identity()));
 
   Multimap rowGroupsMap = 
LinkedListMultimap.create();
-  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getLocation(), entry));
+  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getPath(), entry));
 
-  builder.withRowGroups(rowGroupsMap)
+  filteredMetadata.withRowGroups(rowGroupsMap)
   .withTable(getTableMetadata())
+  .withSegments(segmentsMap)
   .withPartitions(getNextOrEmpty(getPartitionsMetadata()))
   .withNonInterestingColumns(getNonInterestingColumnsMetadata())
   .withFiles(filesMap)
   .withMatching(false);
 }
 
-if (builder.getOverflowLevel() != MetadataLevel.NONE) {
-  logger.warn("applyFilter {} wasn't able to do pruning for  all metadata 
levels filter condition, since metadata count for " +
-"{} level exceeds 
`planner.store.parquet.rowgroup.filter.pushdown.threshold` value.\n" +
-"But underlying metadata was pruned without filter expression 
according to the metadata with above level.",
-  ExpressionStringBuilder.toString(filterExpr), 
builder.getOverflowLevel());
+if (filteredMetadata.getOverflowLevel() != MetadataType.NONE) {
+  if (logger.isWarnEnabled()) {
 
 Review comment:
   Agree, this is very unlikely)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296762428
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -535,31 +581,39 @@ public TableMetadata getTableMetadata() {
 return partitions;
   }
 
+  protected Map getSegmentsMetadata() {
+if (segments == null) {
+  segments = metadataProvider.getSegmentsMetadataMap();
+}
+return segments;
+  }
+
   @JsonIgnore
   public NonInterestingColumnsMetadata getNonInterestingColumnsMetadata() {
 if (nonInterestingColumnsMetadata == null) {
-  nonInterestingColumnsMetadata = 
metadataProvider.getNonInterestingColumnsMeta();
+  nonInterestingColumnsMetadata = 
metadataProvider.getNonInterestingColumnsMetadata();
 }
 return nonInterestingColumnsMetadata;
   }
 
   /**
* This class is responsible for filtering different metadata levels.
*/
-  protected abstract static class GroupScanWithMetadataFilterer {
+  protected abstract static class GroupScanWithMetadataFilterer> {
 protected final AbstractGroupScanWithMetadata source;
 
 protected boolean matchAllMetadata = false;
 
 protected TableMetadata tableMetadata;
 protected List partitions = Collections.emptyList();
+protected Map segments = Collections.emptyMap();
 
 Review comment:
   Yes, it is expected. Later it may be replaced with a regular list or if 
filtering will not happen, there wouldn't be allocated new object.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296781802
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/ColumnStatistics.java
 ##
 @@ -0,0 +1,167 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonAutoDetect;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonPropertyOrder;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.metastore.TableMetadataUtils;
+
+import java.io.IOException;
+import java.util.Collection;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import static 
org.apache.drill.metastore.statistics.StatisticsHolder.OBJECT_WRITER;
+
+/**
+ * Represents collection of statistics values for specific column.
 
 Review comment:
   Thanks, added.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296794477
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataType.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Enum with possible types of metadata.
+ */
+public enum MetadataType {
+
+  ALL,
 
 Review comment:
   Thanks, done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296776704
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/PartitionMetadata.java
 ##
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.hadoop.fs.Path;
+
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+
+/**
+ * Represents a metadata for the table part, which corresponds to the specific 
partition key.
+ */
+public class PartitionMetadata extends BaseMetadata {
+  private final SchemaPath column;
+  private final List partitionValues;
+  private final Set locations;
+  private final long lastModifiedTime;
+
+  private PartitionMetadata(PartitionMetadataBuilder builder) {
+super(builder);
+this.column = builder.column;
+this.partitionValues = builder.partitionValues;
+this.locations = builder.locations;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  /**
+   * It allows to obtain the column path for this partition
+   *
+   * @return column path
+   */
+  public SchemaPath getColumn() {
+return column;
+  }
+
+  /**
+   * File locations for this partition
+   *
+   * @return file locations
+   */
+  public Set getLocations() {
+return locations;
+  }
+
+  /**
+   * It allows to check the time, when any files were modified. It is in Unix 
Timestamp
+   *
+   * @return last modified time of files
+   */
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  public List getPartitionValues() {
+return partitionValues;
+  }
+
+  public static PartitionMetadataBuilder builder() {
+return new PartitionMetadataBuilder();
+  }
+
+  public static class PartitionMetadataBuilder extends 
BaseMetadataBuilder {
+private SchemaPath column;
+private List partitionValues;
+private Set locations;
+private long lastModifiedTime = 
BaseTableMetadata.NON_DEFINED_LAST_MODIFIED_TIME;
+
+public PartitionMetadataBuilder withLocations(Set locations) {
 
 Review comment:
   Agree, renamed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296779015
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/TableInfo.java
 ##
 @@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * General table information.
+ */
+public class TableInfo {
+  public static final String UNKNOWN = "UNKNOWN";
+  public static final TableInfo UNKNOWN_TABLE_INFO = new TableInfo(UNKNOWN, 
UNKNOWN, UNKNOWN, UNKNOWN, UNKNOWN);
+
+  private final String storagePlugin;
+  private final String workspace;
+  private final String name;
+  private final String type;
+  private final String owner;
+
+  public TableInfo(String storagePlugin, String workspace, String name, String 
type, String owner) {
 
 Review comment:
   Thanks, done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296794257
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/BaseParquetMetadataProvider.java
 ##
 @@ -313,18 +328,30 @@ public TableMetadata getTableMetadata() {
   partitionsForValue.asMap().forEach((partitionKey, value) -> {
 Map columnsStatistics = new 
HashMap<>();
 
-Map statistics = new HashMap<>();
+List statistics = new ArrayList<>();
 partitionKey = partitionKey == NULL_VALUE ? null : partitionKey;
-statistics.put(ColumnStatisticsKind.MIN_VALUE, partitionKey);
-statistics.put(ColumnStatisticsKind.MAX_VALUE, partitionKey);
+statistics.add(new StatisticsHolder<>(partitionKey, 
ColumnStatisticsKind.MIN_VALUE));
+statistics.add(new StatisticsHolder<>(partitionKey, 
ColumnStatisticsKind.MAX_VALUE));
 
-statistics.put(ColumnStatisticsKind.NULLS_COUNT, 
Statistic.NO_COLUMN_STATS);
-statistics.put(TableStatisticsKind.ROW_COUNT, 
Statistic.NO_COLUMN_STATS);
+statistics.add(new StatisticsHolder<>(Statistic.NO_COLUMN_STATS, 
ColumnStatisticsKind.NULLS_COUNT));
+statistics.add(new StatisticsHolder<>(Statistic.NO_COLUMN_STATS, 
TableStatisticsKind.ROW_COUNT));
 columnsStatistics.put(partitionColumn,
-new ColumnStatisticsImpl<>(statistics,
-
ParquetTableMetadataUtils.getComparator(getParquetGroupScanStatistics().getTypeForColumn(partitionColumn).getMinorType(;
-partitions.add(new PartitionMetadata(partitionColumn, 
getTableMetadata().getSchema(),
-columnsStatistics, statistics, (Set) value, tableName, 
-1));
+new ColumnStatistics<>(statistics,
+
getParquetGroupScanStatistics().getTypeForColumn(partitionColumn).getMinorType()));
+MetadataInfo metadataInfo = new 
MetadataInfo(MetadataType.PARTITION, MetadataInfo.GENERAL_INFO_KEY, null);
+TableMetadata tableMetadata = getTableMetadata();
+PartitionMetadata partitionMetadata = PartitionMetadata.builder()
+.withTableInfo(tableMetadata.getTableInfo())
+.withMetadataInfo(metadataInfo)
+.withColumn(partitionColumn)
+.withSchema(tableMetadata.getSchema())
+.withColumnsStatistics(columnsStatistics)
+.withStatistics(statistics)
+.withPartitionValues(Collections.emptyList())
+.withLocations((Set) value)
 
 Review comment:
   It is required because `HashMultimap.asMap()` returns map with Collection in 
the values, but for `HashMultimap` used set. To avoid problems for the case 
when `HashMultimap` implementation is changed, I have replaced it with `new 
HashSet<>(value)`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296776542
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/PartitionMetadata.java
 ##
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.hadoop.fs.Path;
+
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+
+/**
+ * Represents a metadata for the table part, which corresponds to the specific 
partition key.
+ */
+public class PartitionMetadata extends BaseMetadata {
+  private final SchemaPath column;
+  private final List partitionValues;
+  private final Set locations;
+  private final long lastModifiedTime;
+
+  private PartitionMetadata(PartitionMetadataBuilder builder) {
+super(builder);
+this.column = builder.column;
+this.partitionValues = builder.partitionValues;
+this.locations = builder.locations;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  /**
+   * It allows to obtain the column path for this partition
+   *
+   * @return column path
+   */
+  public SchemaPath getColumn() {
+return column;
+  }
+
+  /**
+   * File locations for this partition
+   *
+   * @return file locations
+   */
+  public Set getLocations() {
+return locations;
+  }
+
+  /**
+   * It allows to check the time, when any files were modified. It is in Unix 
Timestamp
 
 Review comment:
   Thanks, done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296775175
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataType.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Enum with possible types of metadata.
+ */
+public enum MetadataType {
+
+  ALL,
+
+  /**
+   * Table level metadata type.
+   */
+  TABLE,
+
+  /**
+   * Segment level metadata type. It corresponds to the metadata
+   * within specific directory for FS tables, or may correspond to partition 
for hive tables.
+   */
+  SEGMENT,
+
+  /**
+   * Drill partition level metadata type. It corresponds to parts of table 
data which has the same
+   * values within specific column, i.e. partitions discovered by Drill.
+   */
+  PARTITION,
+
+  /**
+   * File level metadata type.
+   */
+  FILE,
+
+  /**
+   * Row group level metadata type. Used for parquet tables.
+   */
+  ROW_GROUP,
+
+  NONE
 
 Review comment:
   1. Thanks, added.
   2. It is used during filtering to indicate that filtering was finished and 
there was no metadata whose size exceeds 
`PlannerSettings.PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296750708
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
 
 Review comment:
   Agree, `metadataStatistics` fits better, renamed it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296786531
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -210,6 +213,17 @@ public int getMaxParallelizationWidth() {
 return readEntries;
   }
 
+  /**
+   * {@inheritDoc}
+   * 
+   * - if file metadata was pruned, prunes underlying metadata
 
 Review comment:
   Yes, it can. Fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296753052
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
+  this.tableInfo = tableInfo;
+  return self();
+}
+
+public T withMetadataInfo(MetadataInfo metadataInfo) {
+  this.metadataInfo = metadataInfo;
+  return self();
+}
+
+public T withSchema(TupleMetadata schema) {
+  this.schema = schema;
+  return self();
+}
+
+public T withColumnsStatistics(Map 
columnsStatistics) {
+  this.columnsStatistics = columnsStatistics;
+  return self();
+}
+
+public T withStatistics(Collection statistics) {
+  this.statistics = statistics;
+  return self();
+}
+
+protected void 

[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296765872
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -666,27 +733,66 @@ protected void filterTableMetadata(FilterPredicate 
filterPredicate, Set

[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296757799
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.hadoop.fs.Path;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * Base implementation of {@link TableMetadata} interface.
+ */
+public class BaseTableMetadata extends BaseMetadata implements TableMetadata {
+
+  public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1;
+
+  private final Path location;
+  private final long lastModifiedTime;
+  private final Map partitionKeys;
+  private final List interestingColumns;
+
+  private BaseTableMetadata(BaseTableMetadataBuilder builder) {
+super(builder);
+this.location = builder.location;
+this.partitionKeys = builder.partitionKeys;
+this.interestingColumns = builder.interestingColumns;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  public boolean isPartitionColumn(String fieldName) {
+return partitionKeys.containsKey(fieldName);
+  }
+
+  boolean isPartitioned() {
+return !partitionKeys.isEmpty();
+  }
+
+  @Override
+  public Path getLocation() {
+return location;
+  }
+
+  @Override
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  @Override
+  public List getInterestingColumns() {
+return interestingColumns;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public BaseTableMetadata cloneWithStats(Map 
columnStatistics, List tableStatistics) {
+Map mergedTableStatistics = new 
HashMap<>(this.statistics);
+
+// overrides statistics value for the case when new statistics is exact or 
existing one was estimated
+tableStatistics.stream()
+.filter(statisticsHolder -> 
statisticsHolder.getStatisticsKind().isExact()
+  || 
!this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact())
+.forEach(statisticsHolder -> 
mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), 
statisticsHolder));
+
+Map newColumnsStatistics = new 
HashMap<>(this.columnsStatistics);
+this.columnsStatistics.forEach(
+(columnName, value) -> newColumnsStatistics.put(columnName, 
value.cloneWith(columnStatistics.get(columnName;
+
+return BaseTableMetadata.builder()
+.withTableInfo(tableInfo)
+.withMetadataInfo(metadataInfo)
+.withLocation(location)
+.withSchema(schema)
+.withColumnsStatistics(newColumnsStatistics)
+.withStatistics(mergedTableStatistics.values())
+.withLastModifiedTime(lastModifiedTime)
+.withPartitionKeys(partitionKeys)
+.withInterestingColumns(interestingColumns)
+.build();
+  }
+
+  public static BaseTableMetadataBuilder builder() {
+return new BaseTableMetadataBuilder();
+  }
+
+  public static class BaseTableMetadataBuilder extends 
BaseMetadataBuilder {
+private Path location;
+private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME;
+private Map partitionKeys;
+private List interestingColumns;
+
+public BaseTableMetadataBuilder withLocation(Path location) {
+  this.location = location;
+  return self();
+}
+
+public BaseTableMetadataBuilder withLastModifiedTime(long 
lastModifiedTime) {
+  this.lastModifiedTime = lastModifiedTime;
+  return self();
+}
+
+public BaseTableMetadataBuilder withPartitionKeys(Map 
partitionKeys) {
+  this.partitionKeys = partitionKeys;
+  return self();
+}
+
+public BaseTableMetadataBuilder withInterestingColumns(List 
interestingColumns) {
+  this.interestingColumns = 

[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296737602
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillStatsTable.java
 ##
 @@ -452,53 +456,54 @@ public static ObjectMapper getMapper() {
 .addDeserializer(TypeProtos.MajorType.class, new MajorTypeSerDe.De())
 .addDeserializer(SchemaPath.class, new SchemaPath.De());
 mapper.registerModule(deModule);
+mapper.registerSubtypes(new NamedType(NumericEquiDepthHistogram.class, 
"numeric-equi-depth"));
 
 Review comment:
   It would be nice, but I think I can break backward compatibility since it 
was defined earlier here: 
https://github.com/apache/drill/blob/05a1a3a888a7408bde683acc36f406fbd2459254/exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/Histogram.java#L31
 
   So all previously created stats files wouldn't be deserialized correctly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296753431
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
+  this.tableInfo = tableInfo;
+  return self();
+}
+
+public T withMetadataInfo(MetadataInfo metadataInfo) {
+  this.metadataInfo = metadataInfo;
+  return self();
+}
+
+public T withSchema(TupleMetadata schema) {
+  this.schema = schema;
+  return self();
+}
+
+public T withColumnsStatistics(Map 
columnsStatistics) {
+  this.columnsStatistics = columnsStatistics;
+  return self();
+}
+
+public T withStatistics(Collection statistics) {
+  this.statistics = statistics;
+  return self();
+}
+
+protected void 

[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296772407
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataInfo.java
 ##
 @@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Class which identifies specific metadata.
+ */
+public class MetadataInfo {
+
+  public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
+  public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
+  public static final String DEFAULT_COLUMN_PREFIX = "_$SEGMENT_";
 
 Review comment:
   This constant will be used for creating a segment column name to avoid 
depending on the values of session options for partition column names.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296731422
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/StatisticsProvider.java
 ##
 @@ -218,89 +201,85 @@ public ColumnStatistics 
visitFunctionHolderExpression(FunctionHolderExpression h
   ValueHolder minFuncHolder = 
InterpreterEvaluator.evaluateFunction(interpreter, args1, holderExpr.getName());
   ValueHolder maxFuncHolder = 
InterpreterEvaluator.evaluateFunction(interpreter, args2, holderExpr.getName());
 
-  MinMaxStatistics statistics;
   switch (destType) {
 case INT:
-  statistics = new MinMaxStatistics<>(((IntHolder) 
minFuncHolder).value, ((IntHolder) maxFuncHolder).value, Integer::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((IntHolder) minFuncHolder).value,
+  ((IntHolder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case BIGINT:
-  statistics = new MinMaxStatistics<>(((BigIntHolder) 
minFuncHolder).value, ((BigIntHolder) maxFuncHolder).value, Long::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((BigIntHolder) minFuncHolder).value,
+  ((BigIntHolder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case FLOAT4:
-  statistics = new MinMaxStatistics<>(((Float4Holder) 
minFuncHolder).value, ((Float4Holder) maxFuncHolder).value, Float::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((Float4Holder) minFuncHolder).value,
+  ((Float4Holder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case FLOAT8:
-  statistics = new MinMaxStatistics<>(((Float8Holder) 
minFuncHolder).value, ((Float8Holder) maxFuncHolder).value, Double::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((Float8Holder) minFuncHolder).value,
+  ((Float8Holder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case TIMESTAMP:
-  statistics = new MinMaxStatistics<>(((TimeStampHolder) 
minFuncHolder).value, ((TimeStampHolder) maxFuncHolder).value, Long::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((TimeStampHolder) minFuncHolder).value,
+  ((TimeStampHolder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 default:
   return null;
   }
-  statistics.setNullsCount((long) 
input.getStatistic(ColumnStatisticsKind.NULLS_COUNT));
-  return statistics;
 } catch (Exception e) {
-  throw new DrillRuntimeException("Error in evaluating function of " + 
holderExpr.getName() );
+  throw new DrillRuntimeException("Error in evaluating function of " + 
holderExpr.getName());
 }
   }
 
-  public static class MinMaxStatistics implements ColumnStatistics {
-private final V minVal;
-private final V maxVal;
-private final Comparator valueComparator;
-private long nullsCount;
-
-public MinMaxStatistics(V minVal, V maxVal, Comparator valueComparator) 
{
-  this.minVal = minVal;
-  this.maxVal = maxVal;
-  this.valueComparator = valueComparator;
-}
-
-@Override
-public Object getStatistic(StatisticsKind statisticsKind) {
-  switch (statisticsKind.getName()) {
-case ExactStatisticsConstants.MIN_VALUE:
-  return minVal;
-case ExactStatisticsConstants.MAX_VALUE:
-  return maxVal;
-case ExactStatisticsConstants.NULLS_COUNT:
-  return nullsCount;
-default:
-  return null;
-  }
-}
-
-@Override
-public boolean containsStatistic(StatisticsKind statisticsKind) {
-  switch (statisticsKind.getName()) {
-case ExactStatisticsConstants.MIN_VALUE:
-case ExactStatisticsConstants.MAX_VALUE:
-case ExactStatisticsConstants.NULLS_COUNT:
-  return true;
-default:
-  return false;
-  }
-}
-
-@Override
-public boolean containsExactStatistics(StatisticsKind statisticsKind) {
-  return true;
-}
-
-@Override
-public Comparator getValueComparator() {
-  return valueComparator;
-}
+  /**
+   * Returns {@link ColumnStatistics} instance with set min, max values and 
nulls count statistics specified in the arguments.
+   *
+   * @param minVal min value
+   * @param maxVal max value
+   * @param 

[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296752797
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
 
 Review comment:
   Thanks, removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296756147
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.hadoop.fs.Path;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * Base implementation of {@link TableMetadata} interface.
+ */
+public class BaseTableMetadata extends BaseMetadata implements TableMetadata {
+
+  public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1;
+
+  private final Path location;
+  private final long lastModifiedTime;
+  private final Map partitionKeys;
+  private final List interestingColumns;
+
+  private BaseTableMetadata(BaseTableMetadataBuilder builder) {
+super(builder);
+this.location = builder.location;
+this.partitionKeys = builder.partitionKeys;
+this.interestingColumns = builder.interestingColumns;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  public boolean isPartitionColumn(String fieldName) {
+return partitionKeys.containsKey(fieldName);
+  }
+
+  boolean isPartitioned() {
+return !partitionKeys.isEmpty();
+  }
+
+  @Override
+  public Path getLocation() {
+return location;
+  }
+
+  @Override
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  @Override
+  public List getInterestingColumns() {
+return interestingColumns;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public BaseTableMetadata cloneWithStats(Map 
columnStatistics, List tableStatistics) {
+Map mergedTableStatistics = new 
HashMap<>(this.statistics);
+
+// overrides statistics value for the case when new statistics is exact or 
existing one was estimated
+tableStatistics.stream()
+.filter(statisticsHolder -> 
statisticsHolder.getStatisticsKind().isExact()
+  || 
!this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact())
+.forEach(statisticsHolder -> 
mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), 
statisticsHolder));
+
+Map newColumnsStatistics = new 
HashMap<>(this.columnsStatistics);
+this.columnsStatistics.forEach(
+(columnName, value) -> newColumnsStatistics.put(columnName, 
value.cloneWith(columnStatistics.get(columnName;
+
+return BaseTableMetadata.builder()
+.withTableInfo(tableInfo)
+.withMetadataInfo(metadataInfo)
+.withLocation(location)
+.withSchema(schema)
+.withColumnsStatistics(newColumnsStatistics)
+.withStatistics(mergedTableStatistics.values())
+.withLastModifiedTime(lastModifiedTime)
+.withPartitionKeys(partitionKeys)
+.withInterestingColumns(interestingColumns)
+.build();
+  }
+
+  public static BaseTableMetadataBuilder builder() {
+return new BaseTableMetadataBuilder();
+  }
+
+  public static class BaseTableMetadataBuilder extends 
BaseMetadataBuilder {
+private Path location;
+private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME;
+private Map partitionKeys;
+private List interestingColumns;
+
+public BaseTableMetadataBuilder withLocation(Path location) {
+  this.location = location;
+  return self();
+}
+
+public BaseTableMetadataBuilder withLastModifiedTime(long 
lastModifiedTime) {
+  this.lastModifiedTime = lastModifiedTime;
+  return self();
+}
+
+public BaseTableMetadataBuilder withPartitionKeys(Map 
partitionKeys) {
+  this.partitionKeys = partitionKeys;
+  return self();
+}
+
+public BaseTableMetadataBuilder withInterestingColumns(List 
interestingColumns) {
+  this.interestingColumns = 

[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296765020
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -572,34 +626,39 @@ public 
GroupScanWithMetadataFilterer(AbstractGroupScanWithMetadata source) {
  */
 public abstract AbstractGroupScanWithMetadata build();
 
-public GroupScanWithMetadataFilterer withTable(TableMetadata 
tableMetadata) {
+public B withTable(TableMetadata tableMetadata) {
   this.tableMetadata = tableMetadata;
-  return this;
+  return self();
 
 Review comment:
   `self()` method was introduced to return a specific type of implementation 
instead of the base type. So we don't need to add casts for the case when 
`this` instance should be returned.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296743285
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import com.fasterxml.jackson.databind.ObjectWriter;
+
+import java.io.IOException;
+
+/**
+ * Class-holder for statistics kind and its value.
+ *
+ * @param  Type of statistics value
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class StatisticsHolder {
+
+  public static final ObjectWriter OBJECT_WRITER = new 
ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer();
 
 Review comment:
   It is also used in `ColumnStatistics`. Set package default visibility.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296761257
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -221,6 +229,31 @@ public void setFilterForRuntime(LogicalExpression 
filterExpr, OptimizerRulesCont
 if ( ! skipRuntimePruning ) { setFilter(filterExpr); }
   }
 
+  /**
+   * Applies specified filter {@code filterExpr} to current group scan and 
produces filtering at:
+   * 
+   * table level:
+   * - if filter matches all the the data or prunes all the data, sets 
corresponding value to
 
 Review comment:
   Agree, thanks for pointing this, replaced it with nested lists.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296728354
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java
 ##
 @@ -228,5 +228,10 @@ public 
HiveDrillNativeParquetScanFilterer(HiveDrillNativeParquetScan source) {
 protected AbstractParquetGroupScan getNewScan() {
   return new HiveDrillNativeParquetScan((HiveDrillNativeParquetScan) 
source);
 }
+
+@Override
+protected HiveDrillNativeParquetScanFilterer self() {
 
 Review comment:
   This method came from `GroupScanWithMetadataFilterer` and is used to return 
the correct type of `this` instance to avoid casts in parent classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296737969
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/TableMetadataUtils.java
 ##
 @@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.metastore.metadata.BaseMetadata;
+import org.apache.drill.metastore.metadata.TableMetadata;
+import org.apache.drill.metastore.statistics.CollectableColumnStatisticsKind;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.ColumnStatisticsKind;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.TableStatisticsKind;
+import 
org.apache.drill.shaded.guava.com.google.common.primitives.UnsignedBytes;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class TableMetadataUtils {
+
+  private TableMetadataUtils() {
 
 Review comment:
   Agree, removed it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296753640
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.hadoop.fs.Path;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * Base implementation of {@link TableMetadata} interface.
+ */
+public class BaseTableMetadata extends BaseMetadata implements TableMetadata {
+
+  public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1;
+
+  private final Path location;
+  private final long lastModifiedTime;
+  private final Map partitionKeys;
+  private final List interestingColumns;
+
+  private BaseTableMetadata(BaseTableMetadataBuilder builder) {
+super(builder);
+this.location = builder.location;
+this.partitionKeys = builder.partitionKeys;
+this.interestingColumns = builder.interestingColumns;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  public boolean isPartitionColumn(String fieldName) {
+return partitionKeys.containsKey(fieldName);
+  }
+
+  boolean isPartitioned() {
+return !partitionKeys.isEmpty();
+  }
+
+  @Override
+  public Path getLocation() {
+return location;
+  }
+
+  @Override
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  @Override
+  public List getInterestingColumns() {
+return interestingColumns;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public BaseTableMetadata cloneWithStats(Map 
columnStatistics, List tableStatistics) {
+Map mergedTableStatistics = new 
HashMap<>(this.statistics);
+
+// overrides statistics value for the case when new statistics is exact or 
existing one was estimated
+tableStatistics.stream()
+.filter(statisticsHolder -> 
statisticsHolder.getStatisticsKind().isExact()
+  || 
!this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact())
+.forEach(statisticsHolder -> 
mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), 
statisticsHolder));
+
+Map newColumnsStatistics = new 
HashMap<>(this.columnsStatistics);
+this.columnsStatistics.forEach(
+(columnName, value) -> newColumnsStatistics.put(columnName, 
value.cloneWith(columnStatistics.get(columnName;
+
+return BaseTableMetadata.builder()
+.withTableInfo(tableInfo)
+.withMetadataInfo(metadataInfo)
+.withLocation(location)
+.withSchema(schema)
+.withColumnsStatistics(newColumnsStatistics)
+.withStatistics(mergedTableStatistics.values())
+.withLastModifiedTime(lastModifiedTime)
+.withPartitionKeys(partitionKeys)
+.withInterestingColumns(interestingColumns)
+.build();
+  }
+
+  public static BaseTableMetadataBuilder builder() {
+return new BaseTableMetadataBuilder();
+  }
+
+  public static class BaseTableMetadataBuilder extends 
BaseMetadataBuilder {
+private Path location;
+private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME;
+private Map partitionKeys;
+private List interestingColumns;
+
+public BaseTableMetadataBuilder withLocation(Path location) {
 
 Review comment:
   Done, thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296746956
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import com.fasterxml.jackson.databind.ObjectWriter;
+
+import java.io.IOException;
+
+/**
+ * Class-holder for statistics kind and its value.
+ *
+ * @param  Type of statistics value
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class StatisticsHolder {
+
+  public static final ObjectWriter OBJECT_WRITER = new 
ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer();
+  private static final ObjectReader OBJECT_READER = new 
ObjectMapper().readerFor(StatisticsHolder.class);
+
+  private final T statisticsValue;
+  private final BaseStatisticsKind statisticsKind;
+
+  @JsonCreator
+  public StatisticsHolder(@JsonProperty("statisticsValue") T statisticsValue,
+  @JsonProperty("statisticsKind") BaseStatisticsKind 
statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = statisticsKind;
+  }
+
+  public StatisticsHolder(T statisticsValue,
+  StatisticsKind statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = (BaseStatisticsKind) statisticsKind;
+  }
+
+  @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS,
+include = JsonTypeInfo.As.WRAPPER_OBJECT)
+  public T getStatisticsValue() {
+return statisticsValue;
+  }
+
+  public StatisticsKind getStatisticsKind() {
+return statisticsKind;
+  }
+
+  public static StatisticsHolder deserialize(String serialized) throws 
IOException {
+return OBJECT_READER.readValue(serialized);
+  }
+
+  public static String serialize(StatisticsHolder statisticsHolder) throws 
JsonProcessingException {
 
 Review comment:
   Thanks, done for this class and for `ColumnStatistics`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296753490
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
+  this.tableInfo = tableInfo;
+  return self();
+}
+
+public T withMetadataInfo(MetadataInfo metadataInfo) {
+  this.metadataInfo = metadataInfo;
+  return self();
+}
+
+public T withSchema(TupleMetadata schema) {
+  this.schema = schema;
+  return self();
+}
+
+public T withColumnsStatistics(Map 
columnsStatistics) {
+  this.columnsStatistics = columnsStatistics;
+  return self();
+}
+
+public T withStatistics(Collection statistics) {
+  this.statistics = statistics;
+  return self();
+}
+
+protected void 

[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296767257
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -666,27 +733,66 @@ protected void filterTableMetadata(FilterPredicate 
filterPredicate, Set schemaPathsInExpr) {
+protected void filterSegmentMetadata(OptionManager optionManager,
+ FilterPredicate filterPredicate,
+ Set schemaPathsInExpr) {
   if (!matchAllMetadata) {
-if (!source.getPartitionsMetadata().isEmpty()) {
-  if (source.getPartitionsMetadata().size() <= optionManager.getOption(
-
PlannerSettings.PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD)) {
+if (!source.getSegmentsMetadata().isEmpty()) {
+  if (source.getSegmentsMetadata().size() <= optionManager.getOption(
+  
PlannerSettings.PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD)) {
 matchAllMetadata = true;
-partitions = filterAndGetMetadata(schemaPathsInExpr, 
source.getPartitionsMetadata(), filterPredicate, optionManager);
+segments = filterAndGetMetadata(schemaPathsInExpr,
+source.getSegmentsMetadata().values(),
+filterPredicate,
+optionManager).stream()
+.collect(Collectors.toMap(SegmentMetadata::getPath, 
Function.identity()));
 
 Review comment:
   Thanks, formatted the code and added `BinaryOperator`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296746701
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import com.fasterxml.jackson.databind.ObjectWriter;
+
+import java.io.IOException;
+
+/**
+ * Class-holder for statistics kind and its value.
+ *
+ * @param  Type of statistics value
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class StatisticsHolder {
+
+  public static final ObjectWriter OBJECT_WRITER = new 
ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer();
+  private static final ObjectReader OBJECT_READER = new 
ObjectMapper().readerFor(StatisticsHolder.class);
+
+  private final T statisticsValue;
+  private final BaseStatisticsKind statisticsKind;
+
+  @JsonCreator
+  public StatisticsHolder(@JsonProperty("statisticsValue") T statisticsValue,
+  @JsonProperty("statisticsKind") BaseStatisticsKind 
statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = statisticsKind;
+  }
+
+  public StatisticsHolder(T statisticsValue,
+  StatisticsKind statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = (BaseStatisticsKind) statisticsKind;
+  }
+
+  @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS,
+include = JsonTypeInfo.As.WRAPPER_OBJECT)
+  public T getStatisticsValue() {
+return statisticsValue;
+  }
+
+  public StatisticsKind getStatisticsKind() {
+return statisticsKind;
+  }
+
+  public static StatisticsHolder deserialize(String serialized) throws 
IOException {
 
 Review comment:
   Thanks, done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


UDF Not Registering

2019-06-24 Thread Charles Givre
Hello Drill Devs, 
I'm having an issue with some UDFs I'm working on.   Basically, whenever I 
write a UDF which returns a complex type, List or Map,  Drill does not 
recognize the UDF.  I know that Drill CAN do this, as I have written other UDFs 
that return maps, so I'm a little stumped as to why this is happening.  Here is 
the full stack trace and UDF code.  Any help would be greatly appreciated as 
I'm sure I'm missing some small step somewhere.
Thanks,
-- C

java.lang.Exception: org.apache.drill.common.exceptions.UserRemoteException: 
VALIDATION ERROR: From line 1, column 8 to line 1, column 38: No match found 
for function signature get_mx_records()


[Error Id: 1b745443-e05d-4b7a-ada0-85a5a1f23567 ]
For query: select get_mx_records('216.239.36.21') as record from (values(1))

at 
org.apache.drill.test.DrillTestWrapper.compareMergedOnHeapVectors(DrillTestWrapper.java:639)
at 
org.apache.drill.test.DrillTestWrapper.compareOrderedResults(DrillTestWrapper.java:593)
at org.apache.drill.test.DrillTestWrapper.run(DrillTestWrapper.java:166)
at org.apache.drill.test.TestBuilder.go(TestBuilder.java:145)
at 
org.apache.drill.exec.udfs.TestDNSFunctions.testGetMXNames(TestDNSFunctions.java:101)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.drill.exec.rpc.RpcException: 
org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: From 
line 1, column 8 to line 1, column 38: No match found for function signature 
get_mx_records()


[Error Id: 1b745443-e05d-4b7a-ada0-85a5a1f23567 ]
at 
org.apache.drill.exec.rpc.RpcException.mapException(RpcException.java:60)
at 
org.apache.drill.exec.client.DrillClient$ListHoldingResultsListener.getResults(DrillClient.java:881)
at 
org.apache.drill.exec.client.DrillClient.runQuery(DrillClient.java:583)
at org.apache.drill.test.QueryBuilder.results(QueryBuilder.java:339)
at 
org.apache.drill.test.ClusterFixture$FixtureTestServices.testRunAndReturn(ClusterFixture.java:575)
at 
org.apache.drill.test.DrillTestWrapper.testRunAndReturn(DrillTestWrapper.java:935)
at 
org.apache.drill.test.DrillTestWrapper.compareMergedOnHeapVectors(DrillTestWrapper.java:607)
... 5 more
Caused by: org.apache.drill.common.exceptions.UserRemoteException: VALIDATION 
ERROR: From line 1, column 8 to line 1, column 38: No match found for function 
signature get_mx_records()


   






@FunctionTemplate(name = "get_mx_records",
  scope = FunctionTemplate.FunctionScope.SIMPLE,
  nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)

public static class MXRecordListFunction implements DrillSimpleFunc {

  @Param
  VarCharHolder ipaddress;

  @Output
  BaseWriter.ListWriter out;

  @Inject
  DrillBuf buffer;

  @Override
  public void setup() {

  }

  @Override
  public void eval() {
String ipString = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(ipaddress.start,
 ipaddress.end, ipaddress.buffer);
String MXRecordName = "";
VarCharHolder temp = new VarCharHolder();
out.startList();
try {
  org.xbill.DNS.Record[] records = new org.xbill.DNS.Lookup("gmail.com", 
Type.MX).run();
  for (int i = 0; i < records.length; i++) {
org.xbill.DNS.MXRecord mx = (org.xbill.DNS.MXRecord) records[i];
System.out.println("Nameserver " + mx.getTarget());
MXRecordName = mx.getTarget().toString();
temp = new VarCharHolder();

temp.buffer = buffer;
temp.start = 0;
temp.end = MXRecordName.getBytes().length;
buffer.setBytes(0, MXRecordName.getBytes());

out.varChar().write(temp);
  }
} catch(Exception e){
  MXRecordName = "Unknown";
  temp = new VarCharHolder();
  temp.buffer = buffer;
  temp.start = 0;
  temp.end = MXRecordName.getBytes().length;
  buffer.setBytes(0, MXRecordName.getBytes());
  out.varChar();
}
out.endList();
  }
}



[DISCUSS] New approach for using Drill-Calcite fork

2019-06-24 Thread Volodymyr Vysotskyi
Hi all,

Currently, Calcite fork with Drill-specific commits is placed in
https://github.com/mapr/incubator-calcite. Though it is a public
repository, it is problematic to provide writable access for most of the
cases.

Another more frequent problem is deploying new Drill-Calcite versions to
the maven repository (currently they are deployed to
http://repository.mapr.com/nexus/content/repositories/drill-optiq/). Only
several people have writable access for it, and there is no way to provide
it to more people, in particular committers and PMCs.

To resolve these problems, I propose to create a personal repository (I can
create it, here is a test version:
https://github.com/vvysotskyi/drill-calcite) and add all committers and
PMCs as collaborators to it.

To resolve the problem with deploys I propose to use https://jitpack.io, so
it will automatically deploy a newer version when it will be required.
The minor thing which should be mentioned - instead of *org.apache.calcite*
groupId will be used groupId for specific GitHub repo.

The following rules will be used to clarify the process:
- Push changes to the repository (including commit with bumping up the
version)
- Create a new tag for bumped-up version - tag name should match the
version, for example, *1.18.0-drill-r2* and push it to the remote repo
- Bump up version in Drill

Are there are any objections or ideas on this?

Kind regards,
Volodymyr Vysotskyi


[GitHub] [drill] arina-ielchiieva commented on issue #1809: DRILL-6951: Row set based mock data source

2019-06-24 Thread GitBox
arina-ielchiieva commented on issue #1809: DRILL-6951: Row set based mock data 
source
URL: https://github.com/apache/drill/pull/1809#issuecomment-505004114
 
 
   @paul-rogers looks good, please squash the commits.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1813: DRILL-7306: Disable schema-only batch for new scan framework

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1813: DRILL-7306: 
Disable schema-only batch for new scan framework
URL: https://github.com/apache/drill/pull/1813#discussion_r296713307
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java
 ##
 @@ -398,6 +363,40 @@ public void addContext(UserException.Builder builder) {
 }
   }
 
+  /**
+   * Initialize the scan framework builder with standard options.
+   * Call this from the plugin-specific
+   * {@link #frameworkBuilder(OptionManager, EasySubScan)} method.
+   * The plugin can then customize/revise options as needed.
 
 Review comment:
   Please add two params to the Javadoc as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296695307
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.hadoop.fs.Path;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * Base implementation of {@link TableMetadata} interface.
+ */
+public class BaseTableMetadata extends BaseMetadata implements TableMetadata {
+
+  public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1;
+
+  private final Path location;
+  private final long lastModifiedTime;
+  private final Map partitionKeys;
+  private final List interestingColumns;
+
+  private BaseTableMetadata(BaseTableMetadataBuilder builder) {
+super(builder);
+this.location = builder.location;
+this.partitionKeys = builder.partitionKeys;
+this.interestingColumns = builder.interestingColumns;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  public boolean isPartitionColumn(String fieldName) {
+return partitionKeys.containsKey(fieldName);
+  }
+
+  boolean isPartitioned() {
+return !partitionKeys.isEmpty();
+  }
+
+  @Override
+  public Path getLocation() {
+return location;
+  }
+
+  @Override
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  @Override
+  public List getInterestingColumns() {
+return interestingColumns;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public BaseTableMetadata cloneWithStats(Map 
columnStatistics, List tableStatistics) {
+Map mergedTableStatistics = new 
HashMap<>(this.statistics);
+
+// overrides statistics value for the case when new statistics is exact or 
existing one was estimated
+tableStatistics.stream()
+.filter(statisticsHolder -> 
statisticsHolder.getStatisticsKind().isExact()
+  || 
!this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact())
+.forEach(statisticsHolder -> 
mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), 
statisticsHolder));
+
+Map newColumnsStatistics = new 
HashMap<>(this.columnsStatistics);
+this.columnsStatistics.forEach(
+(columnName, value) -> newColumnsStatistics.put(columnName, 
value.cloneWith(columnStatistics.get(columnName;
+
+return BaseTableMetadata.builder()
+.withTableInfo(tableInfo)
+.withMetadataInfo(metadataInfo)
+.withLocation(location)
+.withSchema(schema)
+.withColumnsStatistics(newColumnsStatistics)
+.withStatistics(mergedTableStatistics.values())
+.withLastModifiedTime(lastModifiedTime)
+.withPartitionKeys(partitionKeys)
+.withInterestingColumns(interestingColumns)
+.build();
+  }
+
+  public static BaseTableMetadataBuilder builder() {
+return new BaseTableMetadataBuilder();
+  }
+
+  public static class BaseTableMetadataBuilder extends 
BaseMetadataBuilder {
+private Path location;
+private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME;
+private Map partitionKeys;
+private List interestingColumns;
+
+public BaseTableMetadataBuilder withLocation(Path location) {
+  this.location = location;
+  return self();
+}
+
+public BaseTableMetadataBuilder withLastModifiedTime(long 
lastModifiedTime) {
+  this.lastModifiedTime = lastModifiedTime;
+  return self();
+}
+
+public BaseTableMetadataBuilder withPartitionKeys(Map 
partitionKeys) {
+  this.partitionKeys = partitionKeys;
+  return self();
+}
+
+public BaseTableMetadataBuilder withInterestingColumns(List 
interestingColumns) {
+  

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296694014
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
+  this.tableInfo = tableInfo;
+  return self();
+}
+
+public T withMetadataInfo(MetadataInfo metadataInfo) {
+  this.metadataInfo = metadataInfo;
+  return self();
+}
+
+public T withSchema(TupleMetadata schema) {
+  this.schema = schema;
+  return self();
+}
+
+public T withColumnsStatistics(Map 
columnsStatistics) {
+  this.columnsStatistics = columnsStatistics;
+  return self();
+}
+
+public T withStatistics(Collection statistics) {
+  this.statistics = statistics;
+  return self();
+}
+
+

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296694427
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
 
 Review comment:
   What the difference between statistics and column statistics? Maybe 
statistics should be named better, for example, generalStatistics or 
metadataStatistics?
   I think for Metastore we used `metadataStatistics` naming ...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296696051
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/ColumnStatistics.java
 ##
 @@ -0,0 +1,167 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonAutoDetect;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonPropertyOrder;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.metastore.TableMetadataUtils;
+
+import java.io.IOException;
+import java.util.Collection;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import static 
org.apache.drill.metastore.statistics.StatisticsHolder.OBJECT_WRITER;
+
+/**
+ * Represents collection of statistics values for specific column.
 
 Review comment:
   Can you please add example.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296695012
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
 
 Review comment:
   Do you think `with` can be removed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296695117
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.hadoop.fs.Path;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * Base implementation of {@link TableMetadata} interface.
+ */
+public class BaseTableMetadata extends BaseMetadata implements TableMetadata {
+
+  public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1;
+
+  private final Path location;
+  private final long lastModifiedTime;
+  private final Map partitionKeys;
+  private final List interestingColumns;
+
+  private BaseTableMetadata(BaseTableMetadataBuilder builder) {
+super(builder);
+this.location = builder.location;
+this.partitionKeys = builder.partitionKeys;
+this.interestingColumns = builder.interestingColumns;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  public boolean isPartitionColumn(String fieldName) {
+return partitionKeys.containsKey(fieldName);
+  }
+
+  boolean isPartitioned() {
+return !partitionKeys.isEmpty();
+  }
+
+  @Override
+  public Path getLocation() {
+return location;
+  }
+
+  @Override
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  @Override
+  public List getInterestingColumns() {
+return interestingColumns;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public BaseTableMetadata cloneWithStats(Map 
columnStatistics, List tableStatistics) {
+Map mergedTableStatistics = new 
HashMap<>(this.statistics);
+
+// overrides statistics value for the case when new statistics is exact or 
existing one was estimated
+tableStatistics.stream()
+.filter(statisticsHolder -> 
statisticsHolder.getStatisticsKind().isExact()
+  || 
!this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact())
+.forEach(statisticsHolder -> 
mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), 
statisticsHolder));
+
+Map newColumnsStatistics = new 
HashMap<>(this.columnsStatistics);
+this.columnsStatistics.forEach(
+(columnName, value) -> newColumnsStatistics.put(columnName, 
value.cloneWith(columnStatistics.get(columnName;
+
+return BaseTableMetadata.builder()
+.withTableInfo(tableInfo)
+.withMetadataInfo(metadataInfo)
+.withLocation(location)
+.withSchema(schema)
+.withColumnsStatistics(newColumnsStatistics)
+.withStatistics(mergedTableStatistics.values())
+.withLastModifiedTime(lastModifiedTime)
+.withPartitionKeys(partitionKeys)
+.withInterestingColumns(interestingColumns)
+.build();
+  }
+
+  public static BaseTableMetadataBuilder builder() {
+return new BaseTableMetadataBuilder();
+  }
+
+  public static class BaseTableMetadataBuilder extends 
BaseMetadataBuilder {
+private Path location;
+private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME;
+private Map partitionKeys;
+private List interestingColumns;
+
+public BaseTableMetadataBuilder withLocation(Path location) {
 
 Review comment:
   Same here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296691929
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/PartitionMetadata.java
 ##
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.hadoop.fs.Path;
+
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+
+/**
+ * Represents a metadata for the table part, which corresponds to the specific 
partition key.
+ */
+public class PartitionMetadata extends BaseMetadata {
+  private final SchemaPath column;
+  private final List partitionValues;
+  private final Set locations;
+  private final long lastModifiedTime;
+
+  private PartitionMetadata(PartitionMetadataBuilder builder) {
+super(builder);
+this.column = builder.column;
+this.partitionValues = builder.partitionValues;
+this.locations = builder.locations;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  /**
+   * It allows to obtain the column path for this partition
+   *
+   * @return column path
+   */
+  public SchemaPath getColumn() {
+return column;
+  }
+
+  /**
+   * File locations for this partition
+   *
+   * @return file locations
+   */
+  public Set getLocations() {
+return locations;
+  }
+
+  /**
+   * It allows to check the time, when any files were modified. It is in Unix 
Timestamp
 
 Review comment:
   Add timestamp unit of measurement.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296691350
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataType.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Enum with possible types of metadata.
+ */
+public enum MetadataType {
+
+  ALL,
+
+  /**
+   * Table level metadata type.
+   */
+  TABLE,
+
+  /**
+   * Segment level metadata type. It corresponds to the metadata
+   * within specific directory for FS tables, or may correspond to partition 
for hive tables.
+   */
+  SEGMENT,
+
+  /**
+   * Drill partition level metadata type. It corresponds to parts of table 
data which has the same
+   * values within specific column, i.e. partitions discovered by Drill.
+   */
+  PARTITION,
+
+  /**
+   * File level metadata type.
+   */
+  FILE,
+
+  /**
+   * Row group level metadata type. Used for parquet tables.
+   */
+  ROW_GROUP,
+
+  NONE
 
 Review comment:
   1. Add java doc
   2. Where none can be used?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296693580
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.hadoop.fs.Path;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * Base implementation of {@link TableMetadata} interface.
+ */
+public class BaseTableMetadata extends BaseMetadata implements TableMetadata {
+
+  public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1;
+
+  private final Path location;
+  private final long lastModifiedTime;
+  private final Map partitionKeys;
+  private final List interestingColumns;
+
+  private BaseTableMetadata(BaseTableMetadataBuilder builder) {
+super(builder);
+this.location = builder.location;
+this.partitionKeys = builder.partitionKeys;
+this.interestingColumns = builder.interestingColumns;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  public boolean isPartitionColumn(String fieldName) {
+return partitionKeys.containsKey(fieldName);
+  }
+
+  boolean isPartitioned() {
+return !partitionKeys.isEmpty();
+  }
+
+  @Override
+  public Path getLocation() {
+return location;
+  }
+
+  @Override
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  @Override
+  public List getInterestingColumns() {
+return interestingColumns;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public BaseTableMetadata cloneWithStats(Map 
columnStatistics, List tableStatistics) {
+Map mergedTableStatistics = new 
HashMap<>(this.statistics);
+
+// overrides statistics value for the case when new statistics is exact or 
existing one was estimated
+tableStatistics.stream()
+.filter(statisticsHolder -> 
statisticsHolder.getStatisticsKind().isExact()
+  || 
!this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact())
+.forEach(statisticsHolder -> 
mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), 
statisticsHolder));
+
+Map newColumnsStatistics = new 
HashMap<>(this.columnsStatistics);
+this.columnsStatistics.forEach(
+(columnName, value) -> newColumnsStatistics.put(columnName, 
value.cloneWith(columnStatistics.get(columnName;
+
+return BaseTableMetadata.builder()
+.withTableInfo(tableInfo)
+.withMetadataInfo(metadataInfo)
+.withLocation(location)
+.withSchema(schema)
+.withColumnsStatistics(newColumnsStatistics)
+.withStatistics(mergedTableStatistics.values())
+.withLastModifiedTime(lastModifiedTime)
+.withPartitionKeys(partitionKeys)
+.withInterestingColumns(interestingColumns)
+.build();
+  }
+
+  public static BaseTableMetadataBuilder builder() {
+return new BaseTableMetadataBuilder();
+  }
+
+  public static class BaseTableMetadataBuilder extends 
BaseMetadataBuilder {
+private Path location;
+private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME;
+private Map partitionKeys;
+private List interestingColumns;
+
+public BaseTableMetadataBuilder withLocation(Path location) {
+  this.location = location;
+  return self();
+}
+
+public BaseTableMetadataBuilder withLastModifiedTime(long 
lastModifiedTime) {
+  this.lastModifiedTime = lastModifiedTime;
+  return self();
+}
+
+public BaseTableMetadataBuilder withPartitionKeys(Map 
partitionKeys) {
+  this.partitionKeys = partitionKeys;
+  return self();
+}
+
+public BaseTableMetadataBuilder withInterestingColumns(List 
interestingColumns) {
+  

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296696592
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import com.fasterxml.jackson.databind.ObjectWriter;
+
+import java.io.IOException;
+
+/**
+ * Class-holder for statistics kind and its value.
+ *
+ * @param  Type of statistics value
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class StatisticsHolder {
+
+  public static final ObjectWriter OBJECT_WRITER = new 
ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer();
+  private static final ObjectReader OBJECT_READER = new 
ObjectMapper().readerFor(StatisticsHolder.class);
+
+  private final T statisticsValue;
+  private final BaseStatisticsKind statisticsKind;
+
+  @JsonCreator
+  public StatisticsHolder(@JsonProperty("statisticsValue") T statisticsValue,
+  @JsonProperty("statisticsKind") BaseStatisticsKind 
statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = statisticsKind;
+  }
+
+  public StatisticsHolder(T statisticsValue,
+  StatisticsKind statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = (BaseStatisticsKind) statisticsKind;
+  }
+
+  @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS,
+include = JsonTypeInfo.As.WRAPPER_OBJECT)
+  public T getStatisticsValue() {
+return statisticsValue;
+  }
+
+  public StatisticsKind getStatisticsKind() {
+return statisticsKind;
+  }
+
+  public static StatisticsHolder deserialize(String serialized) throws 
IOException {
+return OBJECT_READER.readValue(serialized);
+  }
+
+  public static String serialize(StatisticsHolder statisticsHolder) throws 
JsonProcessingException {
 
 Review comment:
   Please apply the same for other classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296680787
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -221,6 +229,31 @@ public void setFilterForRuntime(LogicalExpression 
filterExpr, OptimizerRulesCont
 if ( ! skipRuntimePruning ) { setFilter(filterExpr); }
   }
 
+  /**
+   * Applies specified filter {@code filterExpr} to current group scan and 
produces filtering at:
+   * 
+   * table level:
+   * - if filter matches all the the data or prunes all the data, sets 
corresponding value to
 
 Review comment:
   I believe html formatting has notion of nested lists rather than doing 
custom paragraph with dash.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296621362
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java
 ##
 @@ -228,5 +228,10 @@ public 
HiveDrillNativeParquetScanFilterer(HiveDrillNativeParquetScan source) {
 protected AbstractParquetGroupScan getNewScan() {
   return new HiveDrillNativeParquetScan((HiveDrillNativeParquetScan) 
source);
 }
+
+@Override
+protected HiveDrillNativeParquetScanFilterer self() {
 
 Review comment:
   Can you please explain where this method came from?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296684303
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -666,27 +733,66 @@ protected void filterTableMetadata(FilterPredicate 
filterPredicate, Set schemaPathsInExpr) {
+protected void filterSegmentMetadata(OptionManager optionManager,
+ FilterPredicate filterPredicate,
+ Set schemaPathsInExpr) {
   if (!matchAllMetadata) {
-if (!source.getPartitionsMetadata().isEmpty()) {
-  if (source.getPartitionsMetadata().size() <= optionManager.getOption(
-
PlannerSettings.PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD)) {
+if (!source.getSegmentsMetadata().isEmpty()) {
+  if (source.getSegmentsMetadata().size() <= optionManager.getOption(
+  
PlannerSettings.PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD)) {
 matchAllMetadata = true;
-partitions = filterAndGetMetadata(schemaPathsInExpr, 
source.getPartitionsMetadata(), filterPredicate, optionManager);
+segments = filterAndGetMetadata(schemaPathsInExpr,
+source.getSegmentsMetadata().values(),
+filterPredicate,
+optionManager).stream()
+.collect(Collectors.toMap(SegmentMetadata::getPath, 
Function.identity()));
 
 Review comment:
   ```suggestion
   .collect(Collectors.toMap(
   SegmentMetadata::getPath,
   Function.identity()));
   ```
   Plus what about duplicates handling? It would be safer to add `(o, n) -> n` 
but of course if you did not intend to fail on duplicate.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296691639
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/PartitionMetadata.java
 ##
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.hadoop.fs.Path;
+
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+
+/**
+ * Represents a metadata for the table part, which corresponds to the specific 
partition key.
+ */
+public class PartitionMetadata extends BaseMetadata {
+  private final SchemaPath column;
+  private final List partitionValues;
+  private final Set locations;
+  private final long lastModifiedTime;
+
+  private PartitionMetadata(PartitionMetadataBuilder builder) {
+super(builder);
+this.column = builder.column;
+this.partitionValues = builder.partitionValues;
+this.locations = builder.locations;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  /**
+   * It allows to obtain the column path for this partition
+   *
+   * @return column path
+   */
+  public SchemaPath getColumn() {
+return column;
+  }
+
+  /**
+   * File locations for this partition
+   *
+   * @return file locations
+   */
+  public Set getLocations() {
+return locations;
+  }
+
+  /**
+   * It allows to check the time, when any files were modified. It is in Unix 
Timestamp
 
 Review comment:
   ```suggestion
  * Allows to check the time, when any files were modified. It is in Unix 
Timestamp
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296682028
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -535,31 +581,39 @@ public TableMetadata getTableMetadata() {
 return partitions;
   }
 
+  protected Map getSegmentsMetadata() {
+if (segments == null) {
+  segments = metadataProvider.getSegmentsMetadataMap();
+}
+return segments;
+  }
+
   @JsonIgnore
   public NonInterestingColumnsMetadata getNonInterestingColumnsMetadata() {
 if (nonInterestingColumnsMetadata == null) {
-  nonInterestingColumnsMetadata = 
metadataProvider.getNonInterestingColumnsMeta();
+  nonInterestingColumnsMetadata = 
metadataProvider.getNonInterestingColumnsMetadata();
 }
 return nonInterestingColumnsMetadata;
   }
 
   /**
* This class is responsible for filtering different metadata levels.
*/
-  protected abstract static class GroupScanWithMetadataFilterer {
+  protected abstract static class GroupScanWithMetadataFilterer> {
 protected final AbstractGroupScanWithMetadata source;
 
 protected boolean matchAllMetadata = false;
 
 protected TableMetadata tableMetadata;
 protected List partitions = Collections.emptyList();
+protected Map segments = Collections.emptyMap();
 
 Review comment:
   Using Collections emptyMap or emptyList creates unmodifiable objects, is 
this expected?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296689298
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/TableMetadataUtils.java
 ##
 @@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.metastore.metadata.BaseMetadata;
+import org.apache.drill.metastore.metadata.TableMetadata;
+import org.apache.drill.metastore.statistics.CollectableColumnStatisticsKind;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.ColumnStatisticsKind;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.TableStatisticsKind;
+import 
org.apache.drill.shaded.guava.com.google.common.primitives.UnsignedBytes;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class TableMetadataUtils {
+
+  private TableMetadataUtils() {
 
 Review comment:
   Again, no objections but just per my opinion this is an overhead.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296685467
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillStatsTable.java
 ##
 @@ -452,53 +456,54 @@ public static ObjectMapper getMapper() {
 .addDeserializer(TypeProtos.MajorType.class, new MajorTypeSerDe.De())
 .addDeserializer(SchemaPath.class, new SchemaPath.De());
 mapper.registerModule(deModule);
+mapper.registerSubtypes(new NamedType(NumericEquiDepthHistogram.class, 
"numeric-equi-depth"));
 
 Review comment:
   Do you think it makes sense to add `histogram` word as well: 
`numeric-equi-depth-histogram`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296690973
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataInfo.java
 ##
 @@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Class which identifies specific metadata.
 
 Review comment:
   Please write better java doc: "Class that specifies metadata type ..." and 
provide an example.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296687506
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -281,31 +265,54 @@ public AbstractGroupScanWithMetadata 
applyFilter(LogicalExpression filterExpr, U
 
   logger.debug("All row groups have been filtered out. Add back one to get 
schema from scanner");
 
+  Map segmentsMap = 
getNextOrEmpty(getSegmentsMetadata().values()).stream()
+  .collect(Collectors.toMap(SegmentMetadata::getPath, 
Function.identity()));
+
   Map filesMap = 
getNextOrEmpty(getFilesMetadata().values()).stream()
-  .collect(Collectors.toMap(FileMetadata::getLocation, 
Function.identity()));
+  .collect(Collectors.toMap(FileMetadata::getPath, 
Function.identity()));
 
   Multimap rowGroupsMap = 
LinkedListMultimap.create();
-  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getLocation(), entry));
+  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getPath(), entry));
 
-  builder.withRowGroups(rowGroupsMap)
+  filteredMetadata.withRowGroups(rowGroupsMap)
   .withTable(getTableMetadata())
+  .withSegments(segmentsMap)
   .withPartitions(getNextOrEmpty(getPartitionsMetadata()))
   .withNonInterestingColumns(getNonInterestingColumnsMetadata())
   .withFiles(filesMap)
   .withMatching(false);
 }
 
-if (builder.getOverflowLevel() != MetadataLevel.NONE) {
-  logger.warn("applyFilter {} wasn't able to do pruning for  all metadata 
levels filter condition, since metadata count for " +
-"{} level exceeds 
`planner.store.parquet.rowgroup.filter.pushdown.threshold` value.\n" +
-"But underlying metadata was pruned without filter expression 
according to the metadata with above level.",
-  ExpressionStringBuilder.toString(filterExpr), 
builder.getOverflowLevel());
+if (filteredMetadata.getOverflowLevel() != MetadataType.NONE) {
+  if (logger.isWarnEnabled()) {
 
 Review comment:
   No objections for this change but what are the odds of warn level being 
disabled? :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296682838
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -666,27 +733,66 @@ protected void filterTableMetadata(FilterPredicate 
filterPredicate, Set

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296692394
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/PartitionMetadata.java
 ##
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.hadoop.fs.Path;
+
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+
+/**
+ * Represents a metadata for the table part, which corresponds to the specific 
partition key.
+ */
+public class PartitionMetadata extends BaseMetadata {
+  private final SchemaPath column;
+  private final List partitionValues;
+  private final Set locations;
+  private final long lastModifiedTime;
+
+  private PartitionMetadata(PartitionMetadataBuilder builder) {
+super(builder);
+this.column = builder.column;
+this.partitionValues = builder.partitionValues;
+this.locations = builder.locations;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  /**
+   * It allows to obtain the column path for this partition
+   *
+   * @return column path
+   */
+  public SchemaPath getColumn() {
+return column;
+  }
+
+  /**
+   * File locations for this partition
+   *
+   * @return file locations
+   */
+  public Set getLocations() {
+return locations;
+  }
+
+  /**
+   * It allows to check the time, when any files were modified. It is in Unix 
Timestamp
+   *
+   * @return last modified time of files
+   */
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  public List getPartitionValues() {
+return partitionValues;
+  }
+
+  public static PartitionMetadataBuilder builder() {
+return new PartitionMetadataBuilder();
+  }
+
+  public static class PartitionMetadataBuilder extends 
BaseMetadataBuilder {
+private SchemaPath column;
+private List partitionValues;
+private Set locations;
+private long lastModifiedTime = 
BaseTableMetadata.NON_DEFINED_LAST_MODIFIED_TIME;
+
+public PartitionMetadataBuilder withLocations(Set locations) {
 
 Review comment:
   I think you can omit adding with, example: `locations`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296691160
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataType.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Enum with possible types of metadata.
+ */
+public enum MetadataType {
+
+  ALL,
 
 Review comment:
   java doc: "Metadata that can be applicable to any type"


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296693933
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
+  this.tableInfo = tableInfo;
+  return self();
+}
+
+public T withMetadataInfo(MetadataInfo metadataInfo) {
+  this.metadataInfo = metadataInfo;
+  return self();
+}
+
+public T withSchema(TupleMetadata schema) {
+  this.schema = schema;
+  return self();
+}
+
+public T withColumnsStatistics(Map 
columnsStatistics) {
+  this.columnsStatistics = columnsStatistics;
+  return self();
+}
+
+public T withStatistics(Collection statistics) {
+  this.statistics = statistics;
+  return self();
+}
+
+

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296682165
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -572,34 +626,39 @@ public 
GroupScanWithMetadataFilterer(AbstractGroupScanWithMetadata source) {
  */
 public abstract AbstractGroupScanWithMetadata build();
 
-public GroupScanWithMetadataFilterer withTable(TableMetadata 
tableMetadata) {
+public B withTable(TableMetadata tableMetadata) {
   this.tableMetadata = tableMetadata;
-  return this;
+  return self();
 
 Review comment:
   Why `self()` method is better than returning `this`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296694074
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
+  this.tableInfo = tableInfo;
+  return self();
+}
+
+public T withMetadataInfo(MetadataInfo metadataInfo) {
+  this.metadataInfo = metadataInfo;
+  return self();
+}
+
+public T withSchema(TupleMetadata schema) {
+  this.schema = schema;
+  return self();
+}
+
+public T withColumnsStatistics(Map 
columnsStatistics) {
+  this.columnsStatistics = columnsStatistics;
+  return self();
+}
+
+public T withStatistics(Collection statistics) {
+  this.statistics = statistics;
+  return self();
+}
+
+

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296690696
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataInfo.java
 ##
 @@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Class which identifies specific metadata.
+ */
+public class MetadataInfo {
+
+  public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
+  public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
+  public static final String DEFAULT_COLUMN_PREFIX = "_$SEGMENT_";
 
 Review comment:
   Where this constant will be used?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296686176
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -210,6 +213,17 @@ public int getMaxParallelizationWidth() {
 return readEntries;
   }
 
+  /**
+   * {@inheritDoc}
+   * 
+   * - if file metadata was pruned, prunes underlying metadata
 
 Review comment:
   Not sure if we need dash here, can be this covered with nested list?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296615585
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import com.fasterxml.jackson.databind.ObjectWriter;
+
+import java.io.IOException;
+
+/**
+ * Class-holder for statistics kind and its value.
+ *
+ * @param  Type of statistics value
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class StatisticsHolder {
+
+  public static final ObjectWriter OBJECT_WRITER = new 
ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer();
+  private static final ObjectReader OBJECT_READER = new 
ObjectMapper().readerFor(StatisticsHolder.class);
+
+  private final T statisticsValue;
+  private final BaseStatisticsKind statisticsKind;
+
+  @JsonCreator
+  public StatisticsHolder(@JsonProperty("statisticsValue") T statisticsValue,
+  @JsonProperty("statisticsKind") BaseStatisticsKind 
statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = statisticsKind;
+  }
+
+  public StatisticsHolder(T statisticsValue,
+  StatisticsKind statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = (BaseStatisticsKind) statisticsKind;
+  }
+
+  @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS,
+include = JsonTypeInfo.As.WRAPPER_OBJECT)
+  public T getStatisticsValue() {
+return statisticsValue;
+  }
+
+  public StatisticsKind getStatisticsKind() {
+return statisticsKind;
+  }
+
+  public static StatisticsHolder deserialize(String serialized) throws 
IOException {
 
 Review comment:
   Rename: `deserialize` -> `of`, `serialized` -> `jsonString`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296688086
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/BaseParquetMetadataProvider.java
 ##
 @@ -313,18 +328,30 @@ public TableMetadata getTableMetadata() {
   partitionsForValue.asMap().forEach((partitionKey, value) -> {
 Map columnsStatistics = new 
HashMap<>();
 
-Map statistics = new HashMap<>();
+List statistics = new ArrayList<>();
 partitionKey = partitionKey == NULL_VALUE ? null : partitionKey;
-statistics.put(ColumnStatisticsKind.MIN_VALUE, partitionKey);
-statistics.put(ColumnStatisticsKind.MAX_VALUE, partitionKey);
+statistics.add(new StatisticsHolder<>(partitionKey, 
ColumnStatisticsKind.MIN_VALUE));
+statistics.add(new StatisticsHolder<>(partitionKey, 
ColumnStatisticsKind.MAX_VALUE));
 
-statistics.put(ColumnStatisticsKind.NULLS_COUNT, 
Statistic.NO_COLUMN_STATS);
-statistics.put(TableStatisticsKind.ROW_COUNT, 
Statistic.NO_COLUMN_STATS);
+statistics.add(new StatisticsHolder<>(Statistic.NO_COLUMN_STATS, 
ColumnStatisticsKind.NULLS_COUNT));
+statistics.add(new StatisticsHolder<>(Statistic.NO_COLUMN_STATS, 
TableStatisticsKind.ROW_COUNT));
 columnsStatistics.put(partitionColumn,
-new ColumnStatisticsImpl<>(statistics,
-
ParquetTableMetadataUtils.getComparator(getParquetGroupScanStatistics().getTypeForColumn(partitionColumn).getMinorType(;
-partitions.add(new PartitionMetadata(partitionColumn, 
getTableMetadata().getSchema(),
-columnsStatistics, statistics, (Set) value, tableName, 
-1));
+new ColumnStatistics<>(statistics,
+
getParquetGroupScanStatistics().getTypeForColumn(partitionColumn).getMinorType()));
+MetadataInfo metadataInfo = new 
MetadataInfo(MetadataType.PARTITION, MetadataInfo.GENERAL_INFO_KEY, null);
+TableMetadata tableMetadata = getTableMetadata();
+PartitionMetadata partitionMetadata = PartitionMetadata.builder()
+.withTableInfo(tableMetadata.getTableInfo())
+.withMetadataInfo(metadataInfo)
+.withColumn(partitionColumn)
+.withSchema(tableMetadata.getSchema())
+.withColumnsStatistics(columnsStatistics)
+.withStatistics(statistics)
+.withPartitionValues(Collections.emptyList())
+.withLocations((Set) value)
 
 Review comment:
   Why cast is needed here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296695918
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/TableInfo.java
 ##
 @@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * General table information.
+ */
+public class TableInfo {
+  public static final String UNKNOWN = "UNKNOWN";
+  public static final TableInfo UNKNOWN_TABLE_INFO = new TableInfo(UNKNOWN, 
UNKNOWN, UNKNOWN, UNKNOWN, UNKNOWN);
+
+  private final String storagePlugin;
+  private final String workspace;
+  private final String name;
+  private final String type;
+  private final String owner;
+
+  public TableInfo(String storagePlugin, String workspace, String name, String 
type, String owner) {
 
 Review comment:
   Make constructor private and add builder.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296622444
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/StatisticsProvider.java
 ##
 @@ -218,89 +201,85 @@ public ColumnStatistics 
visitFunctionHolderExpression(FunctionHolderExpression h
   ValueHolder minFuncHolder = 
InterpreterEvaluator.evaluateFunction(interpreter, args1, holderExpr.getName());
   ValueHolder maxFuncHolder = 
InterpreterEvaluator.evaluateFunction(interpreter, args2, holderExpr.getName());
 
-  MinMaxStatistics statistics;
   switch (destType) {
 case INT:
-  statistics = new MinMaxStatistics<>(((IntHolder) 
minFuncHolder).value, ((IntHolder) maxFuncHolder).value, Integer::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((IntHolder) minFuncHolder).value,
+  ((IntHolder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case BIGINT:
-  statistics = new MinMaxStatistics<>(((BigIntHolder) 
minFuncHolder).value, ((BigIntHolder) maxFuncHolder).value, Long::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((BigIntHolder) minFuncHolder).value,
+  ((BigIntHolder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case FLOAT4:
-  statistics = new MinMaxStatistics<>(((Float4Holder) 
minFuncHolder).value, ((Float4Holder) maxFuncHolder).value, Float::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((Float4Holder) minFuncHolder).value,
+  ((Float4Holder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case FLOAT8:
-  statistics = new MinMaxStatistics<>(((Float8Holder) 
minFuncHolder).value, ((Float8Holder) maxFuncHolder).value, Double::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((Float8Holder) minFuncHolder).value,
+  ((Float8Holder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case TIMESTAMP:
-  statistics = new MinMaxStatistics<>(((TimeStampHolder) 
minFuncHolder).value, ((TimeStampHolder) maxFuncHolder).value, Long::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((TimeStampHolder) minFuncHolder).value,
+  ((TimeStampHolder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 default:
   return null;
   }
-  statistics.setNullsCount((long) 
input.getStatistic(ColumnStatisticsKind.NULLS_COUNT));
-  return statistics;
 } catch (Exception e) {
-  throw new DrillRuntimeException("Error in evaluating function of " + 
holderExpr.getName() );
+  throw new DrillRuntimeException("Error in evaluating function of " + 
holderExpr.getName());
 }
   }
 
-  public static class MinMaxStatistics implements ColumnStatistics {
-private final V minVal;
-private final V maxVal;
-private final Comparator valueComparator;
-private long nullsCount;
-
-public MinMaxStatistics(V minVal, V maxVal, Comparator valueComparator) 
{
-  this.minVal = minVal;
-  this.maxVal = maxVal;
-  this.valueComparator = valueComparator;
-}
-
-@Override
-public Object getStatistic(StatisticsKind statisticsKind) {
-  switch (statisticsKind.getName()) {
-case ExactStatisticsConstants.MIN_VALUE:
-  return minVal;
-case ExactStatisticsConstants.MAX_VALUE:
-  return maxVal;
-case ExactStatisticsConstants.NULLS_COUNT:
-  return nullsCount;
-default:
-  return null;
-  }
-}
-
-@Override
-public boolean containsStatistic(StatisticsKind statisticsKind) {
-  switch (statisticsKind.getName()) {
-case ExactStatisticsConstants.MIN_VALUE:
-case ExactStatisticsConstants.MAX_VALUE:
-case ExactStatisticsConstants.NULLS_COUNT:
-  return true;
-default:
-  return false;
-  }
-}
-
-@Override
-public boolean containsExactStatistics(StatisticsKind statisticsKind) {
-  return true;
-}
-
-@Override
-public Comparator getValueComparator() {
-  return valueComparator;
-}
+  /**
+   * Returns {@link ColumnStatistics} instance with set min, max values and 
nulls count statistics specified in the arguments.
+   *
+   * @param minVal min value
+   * @param maxVal max value
+   * 

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296615995
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import com.fasterxml.jackson.databind.ObjectWriter;
+
+import java.io.IOException;
+
+/**
+ * Class-holder for statistics kind and its value.
+ *
+ * @param  Type of statistics value
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class StatisticsHolder {
+
+  public static final ObjectWriter OBJECT_WRITER = new 
ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer();
+  private static final ObjectReader OBJECT_READER = new 
ObjectMapper().readerFor(StatisticsHolder.class);
+
+  private final T statisticsValue;
+  private final BaseStatisticsKind statisticsKind;
+
+  @JsonCreator
+  public StatisticsHolder(@JsonProperty("statisticsValue") T statisticsValue,
+  @JsonProperty("statisticsKind") BaseStatisticsKind 
statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = statisticsKind;
+  }
+
+  public StatisticsHolder(T statisticsValue,
+  StatisticsKind statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = (BaseStatisticsKind) statisticsKind;
+  }
+
+  @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS,
+include = JsonTypeInfo.As.WRAPPER_OBJECT)
+  public T getStatisticsValue() {
+return statisticsValue;
+  }
+
+  public StatisticsKind getStatisticsKind() {
+return statisticsKind;
+  }
+
+  public static StatisticsHolder deserialize(String serialized) throws 
IOException {
+return OBJECT_READER.readValue(serialized);
+  }
+
+  public static String serialize(StatisticsHolder statisticsHolder) throws 
JsonProcessingException {
 
 Review comment:
   Should be class level method without parameters: `public String 
toJsonString()`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #1810: DRILL-7271: 
Refactor Metadata interfaces and classes to contain all needed information for 
the File based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296614730
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import com.fasterxml.jackson.databind.ObjectWriter;
+
+import java.io.IOException;
+
+/**
+ * Class-holder for statistics kind and its value.
+ *
+ * @param  Type of statistics value
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class StatisticsHolder {
+
+  public static final ObjectWriter OBJECT_WRITER = new 
ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer();
 
 Review comment:
   private?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-7307) casthigh for decimal type can lead to the issues with VarDecimalHolder

2019-06-24 Thread Dmytriy Grinchenko (JIRA)
Dmytriy Grinchenko created DRILL-7307:
-

 Summary: casthigh for decimal type can lead to the issues with 
VarDecimalHolder
 Key: DRILL-7307
 URL: https://issues.apache.org/jira/browse/DRILL-7307
 Project: Apache Drill
  Issue Type: Bug
Reporter: Dmytriy Grinchenko
Assignee: Dmytriy Grinchenko
 Fix For: 1.17.0


The decimal cast may lead to issues with VarDercimal transformation and issues 
at uml functions which using casthigh under the hood

Example: 
{code}
apache drill> select casthigh(cast(1025.0 as decimal(28,8)));
Error: SYSTEM ERROR: CompileException: Line 25, Column 60: "isSet" is neither a 
method, a field, nor a member class of 
"org.apache.drill.exec.expr.holders.VarDecimalHolder"

Fragment 0:0

Please, refer to logs for more information.
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [drill] vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
vvysotskyi commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296612824
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -281,31 +265,50 @@ public AbstractGroupScanWithMetadata 
applyFilter(LogicalExpression filterExpr, U
 
   logger.debug("All row groups have been filtered out. Add back one to get 
schema from scanner");
 
+  Map segmentsMap = 
getNextOrEmpty(getSegmentsMetadata().values()).stream()
+  .collect(Collectors.toMap(SegmentMetadata::getPath, 
Function.identity()));
+
   Map filesMap = 
getNextOrEmpty(getFilesMetadata().values()).stream()
-  .collect(Collectors.toMap(FileMetadata::getLocation, 
Function.identity()));
+  .collect(Collectors.toMap(FileMetadata::getPath, 
Function.identity()));
 
   Multimap rowGroupsMap = 
LinkedListMultimap.create();
-  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getLocation(), entry));
+  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getPath(), entry));
 
-  builder.withRowGroups(rowGroupsMap)
+  filteredMetadata.withRowGroups(rowGroupsMap)
   .withTable(getTableMetadata())
+  .withSegments(segmentsMap)
   .withPartitions(getNextOrEmpty(getPartitionsMetadata()))
   .withNonInterestingColumns(getNonInterestingColumnsMetadata())
   .withFiles(filesMap)
   .withMatching(false);
 }
 
-if (builder.getOverflowLevel() != MetadataLevel.NONE) {
+if (filteredMetadata.getOverflowLevel() != MetadataType.NONE) {
   logger.warn("applyFilter {} wasn't able to do pruning for  all metadata 
levels filter condition, since metadata count for " +
 "{} level exceeds 
`planner.store.parquet.rowgroup.filter.pushdown.threshold` value.\n" +
 "But underlying metadata was pruned without filter expression 
according to the metadata with above level.",
-  ExpressionStringBuilder.toString(filterExpr), 
builder.getOverflowLevel());
+  ExpressionStringBuilder.toString(filterExpr), 
filteredMetadata.getOverflowLevel());
 }
 
 logger.debug("applyFilter {} reduce row groups # from {} to {}",
-ExpressionStringBuilder.toString(filterExpr), 
getRowGroupsMetadata().size(), builder.getRowGroups().size());
+ExpressionStringBuilder.toString(filterExpr), 
getRowGroupsMetadata().size(), filteredMetadata.getRowGroups().size());
 
 Review comment:
   Thanks, done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] ihuzenko commented on a change in pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread GitBox
ihuzenko commented on a change in pull request #1810: DRILL-7271: Refactor 
Metadata interfaces and classes to contain all needed information for the File 
based Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296327891
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -281,31 +265,50 @@ public AbstractGroupScanWithMetadata 
applyFilter(LogicalExpression filterExpr, U
 
   logger.debug("All row groups have been filtered out. Add back one to get 
schema from scanner");
 
+  Map segmentsMap = 
getNextOrEmpty(getSegmentsMetadata().values()).stream()
+  .collect(Collectors.toMap(SegmentMetadata::getPath, 
Function.identity()));
+
   Map filesMap = 
getNextOrEmpty(getFilesMetadata().values()).stream()
-  .collect(Collectors.toMap(FileMetadata::getLocation, 
Function.identity()));
+  .collect(Collectors.toMap(FileMetadata::getPath, 
Function.identity()));
 
   Multimap rowGroupsMap = 
LinkedListMultimap.create();
-  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getLocation(), entry));
+  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getPath(), entry));
 
-  builder.withRowGroups(rowGroupsMap)
+  filteredMetadata.withRowGroups(rowGroupsMap)
   .withTable(getTableMetadata())
+  .withSegments(segmentsMap)
   .withPartitions(getNextOrEmpty(getPartitionsMetadata()))
   .withNonInterestingColumns(getNonInterestingColumnsMetadata())
   .withFiles(filesMap)
   .withMatching(false);
 }
 
-if (builder.getOverflowLevel() != MetadataLevel.NONE) {
+if (filteredMetadata.getOverflowLevel() != MetadataType.NONE) {
   logger.warn("applyFilter {} wasn't able to do pruning for  all metadata 
levels filter condition, since metadata count for " +
 "{} level exceeds 
`planner.store.parquet.rowgroup.filter.pushdown.threshold` value.\n" +
 "But underlying metadata was pruned without filter expression 
according to the metadata with above level.",
-  ExpressionStringBuilder.toString(filterExpr), 
builder.getOverflowLevel());
+  ExpressionStringBuilder.toString(filterExpr), 
filteredMetadata.getOverflowLevel());
 }
 
 logger.debug("applyFilter {} reduce row groups # from {} to {}",
-ExpressionStringBuilder.toString(filterExpr), 
getRowGroupsMetadata().size(), builder.getRowGroups().size());
+ExpressionStringBuilder.toString(filterExpr), 
getRowGroupsMetadata().size(), filteredMetadata.getRowGroups().size());
 
 Review comment:
   add ```isDebugEnabled()``` check before call 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on issue #1807: DRILL-7293: Convert the regex ("log") plugin to use EVF

2019-06-24 Thread GitBox
arina-ielchiieva commented on issue #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#issuecomment-504905705
 
 
   @paul-rogers I am still unclear if you have tried the following query for 
log plugin data: `select * from table(t(schema=>'inline=(col1 varchar)'))` 
where `t` is table with log plugin data. Did you try it? I suppose it should 
work.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services