[DISCUSSION] ARM-based compatibility tests

2021-01-26 Thread luoc
Hi all,

I have some ARM-based machines (Not X86 architecture), and then want to do 
ARM-based compatibility tests. I know that Netty must bump to 4.1 at first (See 
also #9804 <https://github.com/netty/netty/pull/9804>). do we have anything 
else to upgrade? thanks for your time.

Kind regards
luoc

Re: [DISCUSSION] ARM-based compatibility tests

2021-01-27 Thread luoc
Hi,

@Ted Dunning, I saw that Apple has released an ARM-based Mac (the CPU called 
M1), It's maybe could drive the open source ecosystem.

@Ganesh, Have you donated machines to other Apache TLP?

> 2021年1月27日 下午8:21,Vitalii Diravka  写道:
> 
> Hi Ganesh!
> 
> Could you give more info how it can be used, for what period of time and
> under what terms of use?
> 
> Thanks
> 
> Kind regards
> Vitalii
> 
> 
> On Tue, Jan 26, 2021 at 7:42 PM Ganesh Raju  wrote:
> 
>> We could also donate ARM machines to setup in CI, if it would make sense.
>> 
>> Regards
>> Ganesh
>> 
>> On Tue, Jan 26, 2021 at 11:02 AM Ted Dunning 
>> wrote:
>> 
>>> I did some minimal testing in embedded mode way back, but nothing
>> serious.
>>> 
>>> I saw no issues at all.
>>> 
>>> 
>>> 
>>> On Tue, Jan 26, 2021 at 2:53 AM luoc  wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I have some ARM-based machines (Not X86 architecture), and then want to
>>> do
>>>> ARM-based compatibility tests. I know that Netty must bump to 4.1 at
>>> first
>>>> (See also #9804 <https://github.com/netty/netty/pull/9804>). do we
>> have
>>>> anything else to upgrade? thanks for your time.
>>>> 
>>>> Kind regards
>>>> luoc
>>> 
>> 
>> 
>> --
>> IRC: ganeshraju@#linaro on irc.freenode.ne <http://irc.freenode.net/>t
>> 



Re: [DISCUSSION] ARM-based compatibility tests

2021-01-28 Thread luoc
Hi,

Drill is a system that can take advantage of other components (such as hdfs, 
hbase and more) without relying on them,
so I think the challenge at first is how to provide the underlying components.

> 2021年1月28日 上午5:15,Ted Dunning  写道:
> 
> Cool.
> 
> 
> 
> On Wed, Jan 27, 2021 at 12:40 PM Ganesh Raju  wrote:
> 
>> Ted,
>> These hardware would be a proper ARM based datacenter server VM instance
>> 
>> Ganesh
>> 
>> On Wed, Jan 27, 2021 at 12:00 PM Ted Dunning 
>> wrote:
>> 
>>> Yes. The ARM-based macs sound pretty exciting.
>>> 
>>> My own laptop is about 5 years old so it might be time to think about
>> it. I
>>> have two ARMs on my desk and 4 Intel machines. The odds could even up if
>>> the wind blows right.
>>> 
>>> 
>>> 
>>> On Wed, Jan 27, 2021 at 5:39 AM luoc  wrote:
>>> 
>>>> Hi,
>>>> 
>>>> @Ted Dunning, I saw that Apple has released an ARM-based Mac (the CPU
>>>> called M1), It's maybe could drive the open source ecosystem.
>>>> 
>>>> @Ganesh, Have you donated machines to other Apache TLP?
>>>> 
>>>>> 2021年1月27日 下午8:21,Vitalii Diravka  写道:
>>>>> 
>>>>> Hi Ganesh!
>>>>> 
>>>>> Could you give more info how it can be used, for what period of time
>>> and
>>>>> under what terms of use?
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> Kind regards
>>>>> Vitalii
>>>>> 
>>>>> 
>>>>> On Tue, Jan 26, 2021 at 7:42 PM Ganesh Raju 
>>>> wrote:
>>>>> 
>>>>>> We could also donate ARM machines to setup in CI, if it would make
>>>> sense.
>>>>>> 
>>>>>> Regards
>>>>>> Ganesh
>>>>>> 
>>>>>> On Tue, Jan 26, 2021 at 11:02 AM Ted Dunning >> 
>>>>>> wrote:
>>>>>> 
>>>>>>> I did some minimal testing in embedded mode way back, but nothing
>>>>>> serious.
>>>>>>> 
>>>>>>> I saw no issues at all.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Jan 26, 2021 at 2:53 AM luoc  wrote:
>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> I have some ARM-based machines (Not X86 architecture), and then
>> want
>>>> to
>>>>>>> do
>>>>>>>> ARM-based compatibility tests. I know that Netty must bump to 4.1
>> at
>>>>>>> first
>>>>>>>> (See also #9804 <https://github.com/netty/netty/pull/9804>). do
>> we
>>>>>> have
>>>>>>>> anything else to upgrade? thanks for your time.
>>>>>>>> 
>>>>>>>> Kind regards
>>>>>>>> luoc
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> IRC: ganeshraju@#linaro on irc.freenode.ne <
>> http://irc.freenode.net/
>>>> t
>>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>> --
>> IRC: ganeshraju@#linaro on irc.freenode.ne <http://irc.freenode.net/>t
>> 



Re: [DISCUSSION] ARM-based compatibility tests

2021-02-05 Thread luoc
Hi,
  Thanks. Is it possible to add the ARM instances as options (not a required) 
for CI?

> 2021年2月4日 下午11:46,Ganesh Raju  写道:
> 
> Hi, Yes, aware of that. We can help in setting up all the CI. I am working
> on procuring the server instances, once ready, will create a JIRA to track
> progress
> 
> Regards
> Ganesh
> 
> On Thu, Jan 28, 2021 at 8:28 AM luoc  wrote:
> 
>> Hi,
>> 
>> Drill is a system that can take advantage of other components (such as
>> hdfs, hbase and more) without relying on them,
>> so I think the challenge at first is how to provide the underlying
>> components.
>> 
>>> 2021年1月28日 上午5:15,Ted Dunning  写道:
>>> 
>>> Cool.
>>> 
>>> 
>>> 
>>> On Wed, Jan 27, 2021 at 12:40 PM Ganesh Raju 
>> wrote:
>>> 
>>>> Ted,
>>>> These hardware would be a proper ARM based datacenter server VM instance
>>>> 
>>>> Ganesh
>>>> 
>>>> On Wed, Jan 27, 2021 at 12:00 PM Ted Dunning 
>>>> wrote:
>>>> 
>>>>> Yes. The ARM-based macs sound pretty exciting.
>>>>> 
>>>>> My own laptop is about 5 years old so it might be time to think about
>>>> it. I
>>>>> have two ARMs on my desk and 4 Intel machines. The odds could even up
>> if
>>>>> the wind blows right.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Jan 27, 2021 at 5:39 AM luoc  wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> @Ted Dunning, I saw that Apple has released an ARM-based Mac (the CPU
>>>>>> called M1), It's maybe could drive the open source ecosystem.
>>>>>> 
>>>>>> @Ganesh, Have you donated machines to other Apache TLP?
>>>>>> 
>>>>>>> 2021年1月27日 下午8:21,Vitalii Diravka  写道:
>>>>>>> 
>>>>>>> Hi Ganesh!
>>>>>>> 
>>>>>>> Could you give more info how it can be used, for what period of time
>>>>> and
>>>>>>> under what terms of use?
>>>>>>> 
>>>>>>> Thanks
>>>>>>> 
>>>>>>> Kind regards
>>>>>>> Vitalii
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Jan 26, 2021 at 7:42 PM Ganesh Raju 
>>>>>> wrote:
>>>>>>> 
>>>>>>>> We could also donate ARM machines to setup in CI, if it would make
>>>>>> sense.
>>>>>>>> 
>>>>>>>> Regards
>>>>>>>> Ganesh
>>>>>>>> 
>>>>>>>> On Tue, Jan 26, 2021 at 11:02 AM Ted Dunning >>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I did some minimal testing in embedded mode way back, but nothing
>>>>>>>> serious.
>>>>>>>>> 
>>>>>>>>> I saw no issues at all.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tue, Jan 26, 2021 at 2:53 AM luoc  wrote:
>>>>>>>>> 
>>>>>>>>>> Hi all,
>>>>>>>>>> 
>>>>>>>>>> I have some ARM-based machines (Not X86 architecture), and then
>>>> want
>>>>>> to
>>>>>>>>> do
>>>>>>>>>> ARM-based compatibility tests. I know that Netty must bump to 4.1
>>>> at
>>>>>>>>> first
>>>>>>>>>> (See also #9804 <https://github.com/netty/netty/pull/9804>). do
>>>> we
>>>>>>>> have
>>>>>>>>>> anything else to upgrade? thanks for your time.
>>>>>>>>>> 
>>>>>>>>>> Kind regards
>>>>>>>>>> luoc
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> IRC: ganeshraju@#linaro on irc.freenode.ne <
>>>> http://irc.freenode.net/
>>>>>> t
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> IRC: ganeshraju@#linaro on irc.freenode.ne <http://irc.freenode.net/>t
>>>> 
>> 
>> 
> 
> -- 
> IRC: ganeshraju@#linaro on irc.freenode.ne <http://irc.freenode.net/>t



Re: Regular Video Calls?

2021-03-03 Thread luoc
Wow!
  That sounds good. even though I know a little spoken english only. Because I 
don’t think you are going to blocked the people who only listen to the 
discussion. GMT+8

> 2021年3月4日 上午3:13,Ted Dunning  写道:
> 
> I am still around, but not super active lately. Real life has intruded a
> lot over the last two years.
> 
> 
> 
> On Wed, Mar 3, 2021 at 11:02 AM Curtis Lambert 
> wrote:
> 
>> Thanks Ted! I've been reading the archive history for the mailing list and
>> see you on there a lot, right from the start. Glad to see you're still
>> around and active on here!
>> 
>> 
>> 
>> [image: avatar]
>> Curtis Lambert
>> CTO
>> Email:
>> 
>> cur...@datdistillr.com
>> Phone:
>> 
>> + 706-402-0249
>> [image: LinkedIn]LinkedIn
>>  [image: Calendly]
>> Calendly 
>> [image: Data Distillr logo] 
>> 
>> 
>> On Wed, Mar 3, 2021 at 12:26 PM Ted Dunning  wrote:
>> 
>>> Curtis,
>>> 
>>> I think that would be a great thing. The Drill community has changed over
>>> the last few years and having periodic events could help people come
>>> together in a new way.
>>> 
>>> 
>>> 
>>> 
>>> On Wed, Mar 3, 2021 at 6:35 AM Curtis Lambert 
>>> wrote:
>>> 
 All,
 
 I'm still very new to Drill but as I'm getting spun up on things I
>>> noticed
 there used to be google hangout meets every two weeks but they appear
>> to
 have stopped in 2017. Looking to gather input on if they are worth
>>> starting
 back up and what points they would cover (recognizing all decisions are
 made here not in the meetings). I'm willing to organize and host if the
 interest is there.
 
 Please weigh in on these points:
 
   - If we had regular video calls would you attend?
   - What is the group's preferred application for that? (Zoom/Google?)
   - Periodicity of the calls (every two weeks, monthly, other?)
   - Time of day? (lets use Zulu/GMT/UTC for this point to normalize)
   - Potential topics/scope? (I think some combination of design
   discussions and component walkthrough/learning/presenting would be
>>> good
 for
   expanding and invigorating the community)
 
 
 [image: avatar]
 Curtis Lambert
 CTO
 Email:
 
 cur...@datdistillr.com
 Phone:
 
 + 706-402-0249
 [image: LinkedIn]LinkedIn
  [image:
>>> Calendly]
 Calendly 
 [image: Data Distillr logo] 
 
>>> 
>> 



Re: Regular Video Calls?

2021-03-04 Thread luoc
Hello,
 Thanks Ted, Curtis. It seems that I cannot missing the discussion for ever.
 BTW, I think cgivre are slacking now (hahaha...), forgot  to tell you that we 
have a lot of users who also love to discuss drill in Slack channel. Let me 
paste the links... hard to edit the email by phone

> 在 2021年3月4日,23:16,Ted Dunning  写道:
> 
> Luoc,
> 
> Don't feel shy about not being able to speak well. We all have important
> languages that we can't speak well. If you can mostly understand spoken
> language, you can still easily participate. For one thing, there is the
> chat as Curtis points out. Based on your email, it looks like. you are
> pretty good with written English, in any case.
> 
> For another point, there may be somebody else on the meeting who can
> understand a language you are more comfortable with.
> 
> But the most important thing to remember is that a video meeting is not a
> replacement for the mailing list. Any important ideas from a live meeting
> need to be brought back to the mailing list for discussion and decision
> making. It isn't reasonable to expect all of our worldwide participants to
> be in any kind of realtime meeting. I am speaking right now at 7AM because
> I got up for an inconvenient meeting (after staying up late) so I really
> feel that pain.
> 
> I love your enthusiasm for the project and it would be terrible to lose
> that!
> 
> 
> 
>> On Wed, Mar 3, 2021 at 10:20 PM luoc  wrote:
>> Wow!
>> That sounds good. even though I know a little spoken english only.
>> Because I don’t think you are going to blocked the people who only listen
>> to the discussion. GMT+8
>>> 2021年3月4日 上午3:13,Ted Dunning  写道:
>>> I am still around, but not super active lately. Real life has intruded a
>>> lot over the last two years.
>>> On Wed, Mar 3, 2021 at 11:02 AM Curtis Lambert 
>>> wrote:
>>>> Thanks Ted! I've been reading the archive history for the mailing list
>> and
>>>> see you on there a lot, right from the start. Glad to see you're still
>>>> around and active on here!
>>>> [image: avatar]
>>>> Curtis Lambert
>>>> CTO
>>>> Email:
>>>> cur...@datdistillr.com
>>>> Phone:
>>>> + 706-402-0249
>>>> [image: LinkedIn]LinkedIn
>>>> <https://www.linkedin.com/in/curtis-lambert-2009b2141/> [image:
>> Calendly]
>>>> Calendly <https://calendly.com/curtis283/30min>
>>>> [image: Data Distillr logo] <https://www.datadistillr.com/>
>>>> On Wed, Mar 3, 2021 at 12:26 PM Ted Dunning 
>> wrote:
>>>>> Curtis,
>>>>> I think that would be a great thing. The Drill community has changed
>> over
>>>>> the last few years and having periodic events could help people come
>>>>> together in a new way.
>>>>> On Wed, Mar 3, 2021 at 6:35 AM Curtis Lambert >>>> wrote:
>>>>>> All,
>>>>>> I'm still very new to Drill but as I'm getting spun up on things I
>>>>> noticed
>>>>>> there used to be google hangout meets every two weeks but they appear
>>>> to
>>>>>> have stopped in 2017. Looking to gather input on if they are worth
>>>>> starting
>>>>>> back up and what points they would cover (recognizing all decisions
>> are
>>>>>> made here not in the meetings). I'm willing to organize and host if
>> the
>>>>>> interest is there.
>>>>>> Please weigh in on these points:
>>>>>> - If we had regular video calls would you attend?
>>>>>> - What is the group's preferred application for that? (Zoom/Google?)
>>>>>> - Periodicity of the calls (every two weeks, monthly, other?)
>>>>>> - Time of day? (lets use Zulu/GMT/UTC for this point to normalize)
>>>>>> - Potential topics/scope? (I think some combination of design
>>>>>> discussions and component walkthrough/learning/presenting would be
>>>>> good
>>>>>> for
>>>>>> expanding and invigorating the community)
>>>>>> [image: avatar]
>>>>>> Curtis Lambert
>>>>>> CTO
>>>>>> Email:
>>>>>> cur...@datdistillr.com
>>>>>> Phone:
>>>>>> + 706-402-0249
>>>>>> [image: LinkedIn]LinkedIn
>>>>>> <https://www.linkedin.com/in/curtis-lambert-2009b2141/> [image:
>>>>> Calendly]
>>>>>> Calendly <https://calendly.com/curtis283/30min>
>>>>>> [image: Data Distillr logo] <https://www.datadistillr.com/>


Re: Regular Video Calls?

2021-03-04 Thread luoc

The slack channel is here 
  https://bit.ly/3t4rozO
It will be better if we can also post the plans on Slack, thanks

> 在 2021年3月4日,23:58,luoc  写道:
> Hello,
> Thanks Ted, Curtis. It seems that I cannot missing the discussion for ever.
> BTW, I think cgivre are slacking now (hahaha...), forgot  to tell you that we 
> have a lot of users who also love to discuss drill in Slack channel. Let me 
> paste the links... hard to edit the email by phone
> 
>> 在 2021年3月4日,23:16,Ted Dunning  写道:
>> 
>> Luoc,
>> 
>> Don't feel shy about not being able to speak well. We all have important
>> languages that we can't speak well. If you can mostly understand spoken
>> language, you can still easily participate. For one thing, there is the
>> chat as Curtis points out. Based on your email, it looks like. you are
>> pretty good with written English, in any case.
>> 
>> For another point, there may be somebody else on the meeting who can
>> understand a language you are more comfortable with.
>> 
>> But the most important thing to remember is that a video meeting is not a
>> replacement for the mailing list. Any important ideas from a live meeting
>> need to be brought back to the mailing list for discussion and decision
>> making. It isn't reasonable to expect all of our worldwide participants to
>> be in any kind of realtime meeting. I am speaking right now at 7AM because
>> I got up for an inconvenient meeting (after staying up late) so I really
>> feel that pain.
>> 
>> I love your enthusiasm for the project and it would be terrible to lose
>> that!
>> 
>> 
>> 
>>> On Wed, Mar 3, 2021 at 10:20 PM luoc  wrote:
>>> Wow!
>>> That sounds good. even though I know a little spoken english only.
>>> Because I don’t think you are going to blocked the people who only listen
>>> to the discussion. GMT+8
>>>> 2021年3月4日 上午3:13,Ted Dunning  写道:
>>>> I am still around, but not super active lately. Real life has intruded a
>>>> lot over the last two years.
>>>> On Wed, Mar 3, 2021 at 11:02 AM Curtis Lambert 
>>>> wrote:
>>>>> Thanks Ted! I've been reading the archive history for the mailing list
>>> and
>>>>> see you on there a lot, right from the start. Glad to see you're still
>>>>> around and active on here!
>>>>> [image: avatar]
>>>>> Curtis Lambert
>>>>> CTO
>>>>> Email:
>>>>> cur...@datdistillr.com
>>>>> Phone:
>>>>> + 706-402-0249
>>>>> [image: LinkedIn]LinkedIn
>>>>> <https://www.linkedin.com/in/curtis-lambert-2009b2141/> [image:
>>> Calendly]
>>>>> Calendly <https://calendly.com/curtis283/30min>
>>>>> [image: Data Distillr logo] <https://www.datadistillr.com/>
>>>>> On Wed, Mar 3, 2021 at 12:26 PM Ted Dunning 
>>> wrote:
>>>>>> Curtis,
>>>>>> I think that would be a great thing. The Drill community has changed
>>> over
>>>>>> the last few years and having periodic events could help people come
>>>>>> together in a new way.
>>>>>> On Wed, Mar 3, 2021 at 6:35 AM Curtis Lambert >>>>> wrote:
>>>>>>> All,
>>>>>>> I'm still very new to Drill but as I'm getting spun up on things I
>>>>>> noticed
>>>>>>> there used to be google hangout meets every two weeks but they appear
>>>>> to
>>>>>>> have stopped in 2017. Looking to gather input on if they are worth
>>>>>> starting
>>>>>>> back up and what points they would cover (recognizing all decisions
>>> are
>>>>>>> made here not in the meetings). I'm willing to organize and host if
>>> the
>>>>>>> interest is there.
>>>>>>> Please weigh in on these points:
>>>>>>> - If we had regular video calls would you attend?
>>>>>>> - What is the group's preferred application for that? (Zoom/Google?)
>>>>>>> - Periodicity of the calls (every two weeks, monthly, other?)
>>>>>>> - Time of day? (lets use Zulu/GMT/UTC for this point to normalize)
>>>>>>> - Potential topics/scope? (I think some combination of design
>>>>>>> discussions and component walkthrough/learning/presenting would be
>>>>>> good
>>>>>>> for
>>>>>>> expanding and invigorating the community)
>>>>>>> [image: avatar]
>>>>>>> Curtis Lambert
>>>>>>> CTO
>>>>>>> Email:
>>>>>>> cur...@datdistillr.com
>>>>>>> Phone:
>>>>>>> + 706-402-0249
>>>>>>> [image: LinkedIn]LinkedIn
>>>>>>> <https://www.linkedin.com/in/curtis-lambert-2009b2141/> [image:
>>>>>> Calendly]
>>>>>>> Calendly <https://calendly.com/curtis283/30min>
>>>>>>> [image: Data Distillr logo] <https://www.datadistillr.com/>


[DISCUSSION] One of the most impressive features

2021-04-02 Thread luoc
Hi all,
  I'm from drill team, there will be many new features in release 1.19, 
However, I’m also looking forward to getting your reply about using drill.
  At ApacheCon 2021 ( + ApacheCon 2021 Asia), there is a topic about track 
drill talk, So I hope for a positive response that what is one of the most 
impressive features of drill in your projects?
  That’s for all the developers and drill users, Thanks for your time.

Kind regards
luoc

Re: [DISCUSSION] One of the most impressive features

2021-04-03 Thread luoc
Hi Prabhakar,
  Great. Drill can combine data from multiple data sources on the fly in a 
single query, federated query & analysis is one of the features of apache 
drill. That's exactly what I love about drill.
In release 1.19, drill supported the Cassandra/Scylla, ElasticSearch, Splunk, 
XML and more. Then, they are based on the EVF framework, more stability and 
more powerful than previous version.

> 2021年4月3日 下午8:19,Markenson França  写道:
> 
> Hi Luoc and Prabhakar,
> 
> We use Drill for data merging in Brazillian Federal Court at Rio de Janeiro.
> 
> We developed two stages: extraction and consolidation.
> 
> Extractor get data from several databases (Oracle, MySql, Postgres,
> SqlServer, Ingres, Http and MUMPS) put them in a standard plain text
> format.
> 
> Consolidator is a piece of Python code using Dril for getting all data
> pieces of plain text and combine them in same standard format.
> 
> The result are data blocks available by tematic area (HR, Aquisition
> sector, Law data, etc ) used directely by users (Excel importing via
> network paths) or available through Metabase*.
> 
> Using Drill at consolidation stage we are  avoiding production servers
> overload and joining unthinkable databases like  MUMPS+Oracle+SQL Server.
> Drill consolidation works at speed of the light (thanks for Drill
> performance). Querying plain data with SQL is amazing.
> 
> Regards,
> Markenson
> 
> *I have been used a csv driver Metabase we developed to publish Drill data
> for users (https://github.com/Markenson/csv-metabase-driver). I'm trying to
> developed a driver for Drill via jdbc.
> 
> 
> 
> Em sáb, 3 de abr de 2021 08:18, Prabhakar Bhosaale 
> escreveu:
> 
>> Hi Luoc,
>> the impressive feature for me is to query the data from files
>> (json,csv,parquette etc.) using sql syntax. This makes life very easy.
>> Also i am not sure as i have not tried it but i guess i can query two
>> different storage (json file and oracle database) and combine the data.
>> thx
>> 
>> Regards
>> Prabhakar
>> 
>> On Fri, Apr 2, 2021 at 6:53 PM luoc  wrote:
>> 
>>> Hi all,
>>>  I'm from drill team, there will be many new features in release 1.19,
>>> However, I’m also looking forward to getting your reply about using
>> drill.
>>>  At ApacheCon 2021 ( + ApacheCon 2021 Asia), there is a topic about
>> track
>>> drill talk, So I hope for a positive response that what is one of the
>> most
>>> impressive features of drill in your projects?
>>>  That’s for all the developers and drill users, Thanks for your time.
>>> 
>>> Kind regards
>>> luoc
>> 



Re: [DISCUSSION] One of the most impressive features

2021-04-03 Thread luoc
Hi Markenson,
  That’s wonderful. Your use-case is very detailed and comprehensive. there's a 
good chance show at the ApacheCon. I'm also looking forward to your 
contribution. Please let us know if you have any  issues in development.

> 2021年4月3日 下午8:31,Prabhakar Bhosaale  写道:
> 
> Hey Luoc,
> nice to hear the updates in 1.19. will see how i can fit it one of real
> usecase.
> 
> 
> Regards
> Prabhakar
> 
> On Sat, Apr 3, 2021 at 5:55 PM luoc  wrote:
> 
>> Hi Prabhakar,
>>  Great. Drill can combine data from multiple data sources on the fly in a
>> single query, federated query & analysis is one of the features of apache
>> drill. That's exactly what I love about drill.
>> In release 1.19, drill supported the Cassandra/Scylla, ElasticSearch,
>> Splunk, XML and more. Then, they are based on the EVF framework, more
>> stability and more powerful than previous version.
>> 
>>> 2021年4月3日 下午8:19,Markenson França  写道:
>>> 
>>> Hi Luoc and Prabhakar,
>>> 
>>> We use Drill for data merging in Brazillian Federal Court at Rio de
>> Janeiro.
>>> 
>>> We developed two stages: extraction and consolidation.
>>> 
>>> Extractor get data from several databases (Oracle, MySql, Postgres,
>>> SqlServer, Ingres, Http and MUMPS) put them in a standard plain text
>>> format.
>>> 
>>> Consolidator is a piece of Python code using Dril for getting all data
>>> pieces of plain text and combine them in same standard format.
>>> 
>>> The result are data blocks available by tematic area (HR, Aquisition
>>> sector, Law data, etc ) used directely by users (Excel importing via
>>> network paths) or available through Metabase*.
>>> 
>>> Using Drill at consolidation stage we are  avoiding production servers
>>> overload and joining unthinkable databases like  MUMPS+Oracle+SQL Server.
>>> Drill consolidation works at speed of the light (thanks for Drill
>>> performance). Querying plain data with SQL is amazing.
>>> 
>>> Regards,
>>> Markenson
>>> 
>>> *I have been used a csv driver Metabase we developed to publish Drill
>> data
>>> for users (https://github.com/Markenson/csv-metabase-driver). I'm
>> trying to
>>> developed a driver for Drill via jdbc.
>>> 
>>> 
>>> 
>>> Em sáb, 3 de abr de 2021 08:18, Prabhakar Bhosaale <
>> bhosale@gmail.com>
>>> escreveu:
>>> 
>>>> Hi Luoc,
>>>> the impressive feature for me is to query the data from files
>>>> (json,csv,parquette etc.) using sql syntax. This makes life very easy.
>>>> Also i am not sure as i have not tried it but i guess i can query two
>>>> different storage (json file and oracle database) and combine the data.
>>>> thx
>>>> 
>>>> Regards
>>>> Prabhakar
>>>> 
>>>> On Fri, Apr 2, 2021 at 6:53 PM luoc  wrote:
>>>> 
>>>>> Hi all,
>>>>> I'm from drill team, there will be many new features in release 1.19,
>>>>> However, I’m also looking forward to getting your reply about using
>>>> drill.
>>>>> At ApacheCon 2021 ( + ApacheCon 2021 Asia), there is a topic about
>>>> track
>>>>> drill talk, So I hope for a positive response that what is one of the
>>>> most
>>>>> impressive features of drill in your projects?
>>>>> That’s for all the developers and drill users, Thanks for your time.
>>>>> 
>>>>> Kind regards
>>>>> luoc
>>>> 
>> 
>> 



Re: [DISCUSSION] One of the most impressive features

2021-04-03 Thread luoc
Hi Prabhakar,
  We welcome your feedback.

> 2021年4月3日 下午8:48,luoc  写道:
> 
> Hi Markenson,
>  That’s wonderful. Your use-case is very detailed and comprehensive. there's 
> a good chance show at the ApacheCon. I'm also looking forward to your 
> contribution. Please let us know if you have any  issues in development.
> 
>> 2021年4月3日 下午8:31,Prabhakar Bhosaale  写道:
>> 
>> Hey Luoc,
>> nice to hear the updates in 1.19. will see how i can fit it one of real
>> usecase.
>> 
>> 
>> Regards
>> Prabhakar
>> 
>> On Sat, Apr 3, 2021 at 5:55 PM luoc  wrote:
>> 
>>> Hi Prabhakar,
>>> Great. Drill can combine data from multiple data sources on the fly in a
>>> single query, federated query & analysis is one of the features of apache
>>> drill. That's exactly what I love about drill.
>>> In release 1.19, drill supported the Cassandra/Scylla, ElasticSearch,
>>> Splunk, XML and more. Then, they are based on the EVF framework, more
>>> stability and more powerful than previous version.
>>> 
>>>> 2021年4月3日 下午8:19,Markenson França  写道:
>>>> 
>>>> Hi Luoc and Prabhakar,
>>>> 
>>>> We use Drill for data merging in Brazillian Federal Court at Rio de
>>> Janeiro.
>>>> 
>>>> We developed two stages: extraction and consolidation.
>>>> 
>>>> Extractor get data from several databases (Oracle, MySql, Postgres,
>>>> SqlServer, Ingres, Http and MUMPS) put them in a standard plain text
>>>> format.
>>>> 
>>>> Consolidator is a piece of Python code using Dril for getting all data
>>>> pieces of plain text and combine them in same standard format.
>>>> 
>>>> The result are data blocks available by tematic area (HR, Aquisition
>>>> sector, Law data, etc ) used directely by users (Excel importing via
>>>> network paths) or available through Metabase*.
>>>> 
>>>> Using Drill at consolidation stage we are  avoiding production servers
>>>> overload and joining unthinkable databases like  MUMPS+Oracle+SQL Server.
>>>> Drill consolidation works at speed of the light (thanks for Drill
>>>> performance). Querying plain data with SQL is amazing.
>>>> 
>>>> Regards,
>>>> Markenson
>>>> 
>>>> *I have been used a csv driver Metabase we developed to publish Drill
>>> data
>>>> for users (https://github.com/Markenson/csv-metabase-driver). I'm
>>> trying to
>>>> developed a driver for Drill via jdbc.
>>>> 
>>>> 
>>>> 
>>>> Em sáb, 3 de abr de 2021 08:18, Prabhakar Bhosaale <
>>> bhosale@gmail.com>
>>>> escreveu:
>>>> 
>>>>> Hi Luoc,
>>>>> the impressive feature for me is to query the data from files
>>>>> (json,csv,parquette etc.) using sql syntax. This makes life very easy.
>>>>> Also i am not sure as i have not tried it but i guess i can query two
>>>>> different storage (json file and oracle database) and combine the data.
>>>>> thx
>>>>> 
>>>>> Regards
>>>>> Prabhakar
>>>>> 
>>>>> On Fri, Apr 2, 2021 at 6:53 PM luoc  wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> I'm from drill team, there will be many new features in release 1.19,
>>>>>> However, I’m also looking forward to getting your reply about using
>>>>> drill.
>>>>>> At ApacheCon 2021 ( + ApacheCon 2021 Asia), there is a topic about
>>>>> track
>>>>>> drill talk, So I hope for a positive response that what is one of the
>>>>> most
>>>>>> impressive features of drill in your projects?
>>>>>> That’s for all the developers and drill users, Thanks for your time.
>>>>>> 
>>>>>> Kind regards
>>>>>> luoc
>>>>> 
>>> 
>>> 
> 



Re: Requesting a release

2021-04-12 Thread luoc
Hi,
  Drill is a community-driven project, so we welcome the contributions for ever 
(in any way). there is a document about the release 
.

> 2021年4月13日 上午7:17,Ted Dunning  写道:
> 
> 
> Laurent, 
> 
> There are definitely some steps that require privileges, but the management 
> of the process doesn't require privileges.
> 
> The key steps are a) developing consensus on the content of the release, b) 
> building a release candidate and c) conducting the vote. After these, pushing 
> the artifacts may require some mojo, but it is easy to get somebody to help. 
> None of these other steps require more than a commit bit.
> 
> Remember that the core point of a release is the community involvement, not 
> the technical aspects of packaging.
> 
> On 2021/04/12 21:45:22, Laurent Goujon  wrote: 
>> Hi Ted,
>> 
>> I was led to believe that only a PMC member could perform some of the
>> release tasks, but if not the case, I'm happy to volunteer for the next
>> one. Since it would be my first release, is there any document detailing
>> the list of tasks to be completed?
>> 
>> On Mon, Apr 12, 2021 at 1:55 PM Ted Dunning  wrote:
>> 
>>> Hey Ray,
>>> 
>>> Any Drill committer should be able to act as a release manager.
>>> 
>>> My guess is that you know several Drill committers at Dremio who might be
>>> able to help with this.
>>> 
>>> 
>>> 
>>> On Mon, Apr 12, 2021 at 12:00 PM Ray Lum  wrote:
>>> 
 Hi Drill community,
 
 Is there a process for requesting a release of the latest code currently
>>> in
 master? I am keen on adopting some of the changes if they were in an
 official release.
 
 Thanks kindly,
 Ray
 
>>> 
>> 



Relocate the documents of MapR

2021-04-23 Thread luoc
Hi all,
  Drill website exist a page that introduction the Value Vector 
<http://drill.apache.org/docs/value-vectors/>.  However, there are two links 
had been gone (please click the `Operators` and `Record Batch` on page). Are 
there any friends can help us to point out the new address using a copy of the 
MapR docs? thanks for your time.

Kind regards
luoc

Re: [DISCUSS] Drill Developer information

2021-04-26 Thread luoc
Hi,
  I supported the Wiki. then link to the wiki using a map documents (on dev 
branch).

> 在 2021年4月26日,17:42,James Turton  写道:
> 
> I started to answer "1. The Drill website" thinking that it's better to 
> consolidate the docs rather than fragment them.  But then I realised that 
> these docs are for a different audience than those on the website, so having 
> them somewhere else is less of a concern. And possibly even better because
> 
> 1. a question like "Is it okay to write about unreleased or experimental 
> features?" is easily answered with "Yes" and
> 2. they can be authored the wiki way of ad-hoc independent edits while 
> official docs should probably try to follow a more controlled process.
> 
> Perhaps ideally Paul Rogers' Drill wiki would be merged into such a new dev 
> wiki...
> 
>> On 2021/04/26 11:21, Vitalii Diravka wrote:
>> Hi devs
>> 
>> Currently we have good documentation for Drill developers in Drill source
>> 
>> But to edit smth there we need to create Jira ticket, which can be overkill
>> for some minor edit. So we can place it in a better place.
>> 
>> What way do you prefer:
>> 1. Apache Drill website
>> :
>> 2. GitHub Wiki (currently it is not enabled for Drill, but the good example
>> is in Paul's Drill Wiki )
>> 
>> * Both 1 and 2 better in editing (probably 2 slightly better here).
>> * Possibly 1 is better in searching (via Apache Drill website or Google),
>> but 2 probably is only
>> with GitHub search via Wiki pages.
>> * 2 is essential for devs
>> 
>> Thoughts?
>> 
>> Kind regards
>> Vitalii
>> 


Re: [DISCUSS] Drill 1.19.0 release

2021-05-03 Thread luoc
Hi Laurent,
  Let me clean the JIRA list first. Hold the line, please

> 2021年5月4日 下午12:23,Laurent Goujon  写道:
> 
> Thanks for all the answers
> 
> So the issues I found based on the feedback are:
> 
>   - DRILL-7878: Fix LGTM Alerts
>   
>   - DRILL-7871: StoragePluginStore instances for different users
>   
>   - DRILL-7908: Fix GitHub Actions CI
>   
>   - DRILL-7904: Update to 30-jre Guava version
>   
>   - DRILL-7826: Merge Pcap and Pcapng format plugin based on EVF
>   
>  - DRILL-7828: Refactor Pcap and Pcapng format plugin
>  
>   - DRILL-7910: Bumps commons-io from 2.4 to 2.7
>   
>   - DRILL-7901: Bump junit from 4.12 to 4.13.1
>   
> 
> I wanted to propose Monday May 10th to do the first release candidate, but
> I have some concerns about some of the changes which may not be ready by
> then considering they seem to involve some level of effort and are in very
> early stage: The LGTM alert changes and the StoragePluginStore model
> change. JUnit version update might also become quite a large change if
> instead of moving to 4.13.1, Drill is switching to JUnit5.
> 
> What do people think?
> 
> On Sat, Apr 24, 2021 at 1:00 PM Vitalii Diravka  wrote:
> 
>> Hi Laurent,
>> 
>> I want to include:
>> DRILL-7871  (preparing
>> PR)
>> DRILL-7908  (preparing
>> PR)
>> DRILL-7904  (PR is
>> opened, in review)
>> DRILL-7828  (PR is
>> opened, review is almost completed)
>> 
>> All these tasks are expected to be completed in a week
>> 
>> Kind regards
>> Vitalii
>> 
>> 
>> On Fri, Apr 23, 2021 at 9:25 PM Charles Givre  wrote:
>> 
>>> Hi Laurent,
>>> We have a few PRs pending which I'd like to see in the next version which
>>> are:
>>> 1.  The update(s) and bug fixes to the Mongo plugin.
>>> 2.  There is an extended PR for bug fixes which clean up a lot of alerts
>>> generated by LGTM
>>> 3.  There are a few other library updates which are pending.
>>> 4.  We have some work which changes the access model around storage
>>> plugins which would be good for this release
>>> 5.  The PCAP/PCAP-NG consolidation is awaiting review.
>>> 
>>> I think that's it.
>>> -- C
>>> 
 On Apr 22, 2021, at 12:33 PM, Laurent Goujon 
>> wrote:
 
 Hello everyone,
 
 It has been more than 6 months since the last release, and I believe
>> this
 would be a good time to discuss the next one.
 
 As mentioned in a previous email thread, I am volunteering to be the
 release manager, and I'm looking forward  working with the whole
>>> community
 to make another great release.
 
 We have around 80 changes in master since the last release, and there
>> are
 several changes open for review too. It would be nice if people could
>>> reply
 to this email and share issues which should be part of that release, so
>>> we
 can decide on an initial cut-off date.
 
 Thanks in advance,
 
 Laurent
>>> 
>>> 
>> 



Re: [DISCUSS] Drill 1.19.0 release

2021-05-03 Thread luoc
Hi Laurent,
  DRILL-7908
  DRILL-7826
  DRILL-7828
  DRILL-7910
The above JIRA have been solved, then I will make suggestions for the PRs in 
the evening (GMT+8).

> 2021年5月4日 下午12:34,luoc  写道:
> 
> Hi Laurent,
>  Let me clean the JIRA list first. Hold the line, please
> 
>> 2021年5月4日 下午12:23,Laurent Goujon  写道:
>> 
>> Thanks for all the answers
>> 
>> So the issues I found based on the feedback are:
>> 
>>  - DRILL-7878: Fix LGTM Alerts
>>  <https://issues.apache.org/jira/browse/DRILL-7878>
>>  - DRILL-7871: StoragePluginStore instances for different users
>>  <https://issues.apache.org/jira/browse/DRILL-7871>
>>  - DRILL-7908: Fix GitHub Actions CI
>>  <https://issues.apache.org/jira/browse/DRILL-7908>
>>  - DRILL-7904: Update to 30-jre Guava version
>>  <https://issues.apache.org/jira/browse/DRILL-7904>
>>  - DRILL-7826: Merge Pcap and Pcapng format plugin based on EVF
>>  <https://issues.apache.org/jira/browse/DRILL-7826>
>> - DRILL-7828: Refactor Pcap and Pcapng format plugin
>> <https://issues.apache.org/jira/browse/DRILL-7828>
>>  - DRILL-7910: Bumps commons-io from 2.4 to 2.7
>>  <https://issues.apache.org/jira/browse/DRILL-7910>
>>  - DRILL-7901: Bump junit from 4.12 to 4.13.1
>>  <https://issues.apache.org/jira/browse/DRILL-7901>
>> 
>> I wanted to propose Monday May 10th to do the first release candidate, but
>> I have some concerns about some of the changes which may not be ready by
>> then considering they seem to involve some level of effort and are in very
>> early stage: The LGTM alert changes and the StoragePluginStore model
>> change. JUnit version update might also become quite a large change if
>> instead of moving to 4.13.1, Drill is switching to JUnit5.
>> 
>> What do people think?
>> 
>> On Sat, Apr 24, 2021 at 1:00 PM Vitalii Diravka  wrote:
>> 
>>> Hi Laurent,
>>> 
>>> I want to include:
>>> DRILL-7871 <https://issues.apache.org/jira/browse/DRILL-7871> (preparing
>>> PR)
>>> DRILL-7908 <https://issues.apache.org/jira/browse/DRILL-7908> (preparing
>>> PR)
>>> DRILL-7904 <https://issues.apache.org/jira/browse/DRILL-7904> (PR is
>>> opened, in review)
>>> DRILL-7828 <https://issues.apache.org/jira/browse/DRILL-7828> (PR is
>>> opened, review is almost completed)
>>> 
>>> All these tasks are expected to be completed in a week
>>> 
>>> Kind regards
>>> Vitalii
>>> 
>>> 
>>> On Fri, Apr 23, 2021 at 9:25 PM Charles Givre  wrote:
>>> 
>>>> Hi Laurent,
>>>> We have a few PRs pending which I'd like to see in the next version which
>>>> are:
>>>> 1.  The update(s) and bug fixes to the Mongo plugin.
>>>> 2.  There is an extended PR for bug fixes which clean up a lot of alerts
>>>> generated by LGTM
>>>> 3.  There are a few other library updates which are pending.
>>>> 4.  We have some work which changes the access model around storage
>>>> plugins which would be good for this release
>>>> 5.  The PCAP/PCAP-NG consolidation is awaiting review.
>>>> 
>>>> I think that's it.
>>>> -- C
>>>> 
>>>>> On Apr 22, 2021, at 12:33 PM, Laurent Goujon 
>>> wrote:
>>>>> 
>>>>> Hello everyone,
>>>>> 
>>>>> It has been more than 6 months since the last release, and I believe
>>> this
>>>>> would be a good time to discuss the next one.
>>>>> 
>>>>> As mentioned in a previous email thread, I am volunteering to be the
>>>>> release manager, and I'm looking forward  working with the whole
>>>> community
>>>>> to make another great release.
>>>>> 
>>>>> We have around 80 changes in master since the last release, and there
>>> are
>>>>> several changes open for review too. It would be nice if people could
>>>> reply
>>>>> to this email and share issues which should be part of that release, so
>>>> we
>>>>> can decide on an initial cut-off date.
>>>>> 
>>>>> Thanks in advance,
>>>>> 
>>>>> Laurent
>>>> 
>>>> 
>>> 



Re: Test Apache Drill on Linux ARM64

2021-05-08 Thread luoc
Hi Martin,
  The ARM-based system is the trend of computing architecture. multicore and 
low-energy. I'll help to drive this plan.

> 2021年5月8日 上午12:47,Ted Dunning  写道:
> 
> Martin,
> 
> This is exciting stuff that you are doing and very useful.
> 
> My thought is that of the options you describe, it seems like the travis
> option is a good first step because it is nearly trivial (just add the ci
> config file with a trivial build and test)
> 
> Running builds on a remote builder nodes seems to me to increase
> dependencies that could cause debug actions at a later stage. I don't
> understand the level of stability that should be expected and I don't
> understand how certain that expectation should be.
> 
> I am not clear on CircleCI versus Github Actions versus travis. The
> timeouts sound better and you mention arm support, but I have no experience
> to guide.
> 
> Others probably have better and more complete thoughts than these.
> 
> 
> 
> On Fri, May 7, 2021 at 5:02 AM Martin Tzvetanov Grigorov <
> mgrigo...@apache.org> wrote:
> 
>> Hello Drill developers,
>> 
>> Recently I've tried to build Apache Drill on ARM64 hardware running on
>> Linux.
>> I have found few issues which are described in issue
>> https://issues.apache.org/jira/browse/DRILL-7911
>> 
>> I've created few Pull Requests with fixes for each issue:
>> - https://github.com/apache/drill/pull/2217 - use TestContainers-MySQL
>> instead of Wix-Embedded-MySQL
>> - https://github.com/apache/drill/pull/2218 - Disable Storage-Splunk unit
>> tests on Linux ARM64 because there is no Docker image of Splunk for
>> Linux/arm64
>> - https://github.com/apache/drill/pull/2219 - Increase Max Direct Memory
>> 
>> Now I would like to suggest adding CI testing on ARM64 to prevent
>> regressions in the future.
>> The problem is that GitHub Actions (the CI system used by Apache Drill)
>> does not yet support ARM64 architecture.
>> 
>> Here are the possible solutions I am aware of:
>> 
>> * use TravisCI only for running `mvn install` on Linux ARM64
>> Pros:
>> - TravisCI supports Linux ARM64 out of the box and the config is quite
>> simple
>> - Might be useful later if someone wants to add testing on Linux s390x
>> Cons:
>> - Use a second CI for such specific purpose
>> 
>> * Use GitHub Actions to run the build at a remote Kubernetes cluster with
>> ARM64 nodes
>> More details about this approach could be read at
>> https://martin-grigorov.medium.com/githubactions-build-and-test-on-huaweicloud-arm64-af9d5c97b766
>> Disclaimer: I work for OpenLab Testing and Huawei sponsor us, so I can get
>> you a free account at HuaweiCloud for such setup.
>> The same setup could be used with any other Kubernetes provider!
>> 
>> * Use CircleCI instead of GitHub Actions
>> Pros:
>> - native support for both x86_64 and aarch64  (
>> https://github.com/CircleCI-Public/arm-preview-docs)
>> - CircleCI allows connecting via SSH to a builder node. This way one can
>> debug issues
>> - higher job timeout (5h) -
>> https://circleci.com/docs/2.0/runner-installation/#runner-max_run_time.
>> Currently Github Actions often fail due to build timeouts of 90mins
>> - it is less crowded than the Apache organization at GitHub Actions (
>> https://ibb.co/RpFyQQy), so there is less wait time for the build
>> Cons:
>> - work is required to migrate from GitHub Actions to CircleCI
>> 
>> I volunteer to do the work for any of these options. Just please let me
>> know which one is your preferred one!
>> 
>> Regards,
>> Martin
>> 



Re: [DISCUSS] Drill 1.19.0 release

2021-05-08 Thread luoc
Hi Vitalii,
  Would you mind sharing that... Is DRILL-7904 ready to review again? And 
what’s the status on the DRILL-7871? thanks

> 2021年5月4日 下午1:10,Ted Dunning  写道:
> 
> Laurent,
> 
> I don't have a stake here, so can't really comment about specifics, but the
> process is looking good.
> 
> 
> 
> On Mon, May 3, 2021 at 9:23 PM Laurent Goujon  wrote:
> 
>> Thanks for all the answers
>> 
>> So the issues I found based on the feedback are:
>> 
>>   - DRILL-7878: Fix LGTM Alerts
>>   
>>   - DRILL-7871: StoragePluginStore instances for different users
>>   
>>   - DRILL-7908: Fix GitHub Actions CI
>>   
>>   - DRILL-7904: Update to 30-jre Guava version
>>   
>>   - DRILL-7826: Merge Pcap and Pcapng format plugin based on EVF
>>   
>>  - DRILL-7828: Refactor Pcap and Pcapng format plugin
>>  
>>   - DRILL-7910: Bumps commons-io from 2.4 to 2.7
>>   
>>   - DRILL-7901: Bump junit from 4.12 to 4.13.1
>>   
>> 
>> I wanted to propose Monday May 10th to do the first release candidate, but
>> I have some concerns about some of the changes which may not be ready by
>> then considering they seem to involve some level of effort and are in very
>> early stage: The LGTM alert changes and the StoragePluginStore model
>> change. JUnit version update might also become quite a large change if
>> instead of moving to 4.13.1, Drill is switching to JUnit5.
>> 
>> What do people think?
>> 
>> On Sat, Apr 24, 2021 at 1:00 PM Vitalii Diravka 
>> wrote:
>> 
>>> Hi Laurent,
>>> 
>>> I want to include:
>>> DRILL-7871  (preparing
>>> PR)
>>> DRILL-7908  (preparing
>>> PR)
>>> DRILL-7904  (PR is
>>> opened, in review)
>>> DRILL-7828  (PR is
>>> opened, review is almost completed)
>>> 
>>> All these tasks are expected to be completed in a week
>>> 
>>> Kind regards
>>> Vitalii
>>> 
>>> 
>>> On Fri, Apr 23, 2021 at 9:25 PM Charles Givre  wrote:
>>> 
 Hi Laurent,
 We have a few PRs pending which I'd like to see in the next version
>> which
 are:
 1.  The update(s) and bug fixes to the Mongo plugin.
 2.  There is an extended PR for bug fixes which clean up a lot of
>> alerts
 generated by LGTM
 3.  There are a few other library updates which are pending.
 4.  We have some work which changes the access model around storage
 plugins which would be good for this release
 5.  The PCAP/PCAP-NG consolidation is awaiting review.
 
 I think that's it.
 -- C
 
> On Apr 22, 2021, at 12:33 PM, Laurent Goujon 
>>> wrote:
> 
> Hello everyone,
> 
> It has been more than 6 months since the last release, and I believe
>>> this
> would be a good time to discuss the next one.
> 
> As mentioned in a previous email thread, I am volunteering to be the
> release manager, and I'm looking forward  working with the whole
 community
> to make another great release.
> 
> We have around 80 changes in master since the last release, and there
>>> are
> several changes open for review too. It would be nice if people could
 reply
> to this email and share issues which should be part of that release,
>> so
 we
> can decide on an initial cut-off date.
> 
> Thanks in advance,
> 
> Laurent
 
 
>>> 
>> 



Re: [VOTE] Add Dependabot to Drill

2021-05-16 Thread luoc
Hi,
  +1. Yes, let's do it.

> 在 2021年5月17日,02:34,Ted Dunning  写道:
> 
> I love dependabot.
> 
> I do minimal maintenance on several dozen demo projects and having a bot
> check the dependencies for vulnerabilities is a god-send.
> 
> There is no downside. Yes, I get a bunch of pull requests when somebody
> digs up another obscure problem with Jackson, but that isn't a problem.  I
> have to worry about dependencies anyway, so why not make it relatively easy
> to do?



Re: known bug in csv header parsing

2021-05-20 Thread luoc
Hello Ted,
It's nice idea. I have done a quick review for the CSV reader, but not found 
any settings to process the errors. And then, We have refactored the CSV format 
using the EVF, please see the CompliantTextBatchReader.java (Complies with the 
RFC 4180 standard for text/csv files).

> 在 2021年5月20日,13:49,Ted Dunning  写道:
> 
> I have a csv file that causes an exception when read by Drill. The file is
> slightly mal-formed (but R can read it).
> 
> Interestingly, if I don't parse the header line, I don't get the exception
> and the problematic embedded quotes are handled well. Likewise, deleting
> the first data line (which is well-formed) causes the exception to go away.
> Deleting the second data line also causes the exception to stop. Fixing the
> quoting of the included quotes also fixes the problem. Swapping the lines
> works like deleting the first line. Repeating the first line after the
> second line still gets the exception.
> 
> The file is this:
> -
> 
> desc,name
> 
> "foo","x"
> 
> "manure called "foo"","y"
> 
> -
> 
> 
> The exception is shown below. My thought is that if the CSV file is
> considered mal-formed, we should get an error on the line that says
> something along the lines of "mal-formed input". Even better would be to
> allow such lines to be omitted (up to some sanity limit) or to parse it
> correctly (which happens without headers being parsed).
> 
> Anybody have any thoughts?
> 
> Here is the R behavior (it omits the embedded quotes):
> 
>> f = read.csv("v.csv")
> 
>> f
> 
>   desc name
> 
> 1   foox
> 
> 2 manure called fooy
> 
> 
> And here is the exception:
> 
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> NegativeArraySizeException Please, refer to logs for more information.
> [Error Id: 7153f837-45eb-43d1-8e19-e3ca0197c61b ]
> (java.lang.NegativeArraySizeException) null
> org.apache.drill.exec.vector.VarCharVector$Accessor.get():487
> org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():514
> org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():475
> org.apache.drill.exec.server.rest.WebUserConnection.sendData():147
> org.apache.drill.exec.ops.AccountingUserConnection.sendData():42
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():120
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1669
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748


Re: known bug in csv header parsing

2021-05-21 Thread luoc
Hi Ted,
  You can use the new version of CSV reader (binding the 
CompliantTextBatchReader) to query the CSV since 1.16 (no changes in the 
usage). But this reader does not support your idea. I think we can provide a 
few codes to enhance the reader. All the new storage and format plugin base the 
EVF, more powerful and stable.

> 2021年5月20日 下午10:40,Ted Dunning  写道:
> 
> Luoc,
> 
> How do I use the CompliantTextBatchReader?
> 
> How is the speed?
> 
> Can you point me at the old CSV reader? I am not sure where it is.
> 
> 
> 
> On Thu, May 20, 2021 at 1:09 AM luoc  wrote:
> 
>> Hello Ted,
>> It's nice idea. I have done a quick review for the CSV reader, but not
>> found any settings to process the errors. And then, We have refactored the
>> CSV format using the EVF, please see the CompliantTextBatchReader.java
>> (Complies with the RFC 4180 standard for text/csv files).
>> 
>>> 在 2021年5月20日,13:49,Ted Dunning  写道:
>>> 
>>> I have a csv file that causes an exception when read by Drill. The file
>> is
>>> slightly mal-formed (but R can read it).
>>> 
>>> Interestingly, if I don't parse the header line, I don't get the
>> exception
>>> and the problematic embedded quotes are handled well. Likewise, deleting
>>> the first data line (which is well-formed) causes the exception to go
>> away.
>>> Deleting the second data line also causes the exception to stop. Fixing
>> the
>>> quoting of the included quotes also fixes the problem. Swapping the lines
>>> works like deleting the first line. Repeating the first line after the
>>> second line still gets the exception.
>>> 
>>> The file is this:
>>> -
>>> 
>>> desc,name
>>> 
>>> "foo","x"
>>> 
>>> "manure called "foo"","y"
>>> 
>>> -
>>> 
>>> 
>>> The exception is shown below. My thought is that if the CSV file is
>>> considered mal-formed, we should get an error on the line that says
>>> something along the lines of "mal-formed input". Even better would be to
>>> allow such lines to be omitted (up to some sanity limit) or to parse it
>>> correctly (which happens without headers being parsed).
>>> 
>>> Anybody have any thoughts?
>>> 
>>> Here is the R behavior (it omits the embedded quotes):
>>> 
>>>> f = read.csv("v.csv")
>>> 
>>>> f
>>> 
>>>  desc name
>>> 
>>> 1   foox
>>> 
>>> 2 manure called fooy
>>> 
>>> 
>>> And here is the exception:
>>> 
>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
>>> NegativeArraySizeException Please, refer to logs for more information.
>>> [Error Id: 7153f837-45eb-43d1-8e19-e3ca0197c61b ]
>>> (java.lang.NegativeArraySizeException) null
>>> org.apache.drill.exec.vector.VarCharVector$Accessor.get():487
>>> org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():514
>>> org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():475
>>> org.apache.drill.exec.server.rest.WebUserConnection.sendData():147
>>> org.apache.drill.exec.ops.AccountingUserConnection.sendData():42
>>> 
>> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():120
>>> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
>>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
>>> java.security.AccessController.doPrivileged():-2
>>> javax.security.auth.Subject.doAs():422
>>> org.apache.hadoop.security.UserGroupInformation.doAs():1669
>>> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
>>> org.apache.drill.common.SelfCleaningRunnable.run():38
>>> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>>> java.lang.Thread.run():748
>> 



Re: [DISCUSS] Drill 1.19.0 release

2021-05-22 Thread luoc
Hi Laurent,
 It’s time to do a release with 1.19.0.

> 2021年5月19日 上午2:20,Vitalii Diravka  写道:
> 
> Hi Laurent,
> DRILL-7871 requires additional time to be introduced and it is better to
> include it for the next release.
> DRILL-7904 is updated, I think it will be merged in a few days. But it
> doesn't matter whether it is included in this release or in the next one.
> 
> So we can plan to start the release process
> 
> 
> Kind regards
> Vitalii
> 
> 
> On Tue, May 11, 2021 at 7:52 PM Laurent Goujon  wrote:
> 
>> Thanks Vitalii
>> 
>> On Tue, May 11, 2021 at 9:29 AM Vitalii Diravka 
>> wrote:
>> 
>>> Hi Luoc!
>>> 
>>> They are almost ready. I plan to update PR for them today.
>>> 
>>> Kind regards
>>> Vitalii
>>> 
>>> 
>>> On Sat, May 8, 2021 at 5:26 PM luoc  wrote:
>>> 
>>>> Hi Vitalii,
>>>>  Would you mind sharing that... Is DRILL-7904 ready to review again?
>>> And what’s
>>>> the status on the DRILL-7871? thanks
>>>> 
>>>> 2021年5月4日 下午1:10,Ted Dunning  写道:
>>>> 
>>>> Laurent,
>>>> 
>>>> I don't have a stake here, so can't really comment about specifics, but
>>> the
>>>> process is looking good.
>>>> 
>>>> 
>>>> 
>>>> On Mon, May 3, 2021 at 9:23 PM Laurent Goujon 
>>> wrote:
>>>> 
>>>> Thanks for all the answers
>>>> 
>>>> So the issues I found based on the feedback are:
>>>> 
>>>>  - DRILL-7878: Fix LGTM Alerts
>>>>  <https://issues.apache.org/jira/browse/DRILL-7878>
>>>>  - DRILL-7871: StoragePluginStore instances for different users
>>>>  <https://issues.apache.org/jira/browse/DRILL-7871>
>>>>  - DRILL-7908: Fix GitHub Actions CI
>>>>  <https://issues.apache.org/jira/browse/DRILL-7908>
>>>>  - DRILL-7904: Update to 30-jre Guava version
>>>>  <https://issues.apache.org/jira/browse/DRILL-7904>
>>>>  - DRILL-7826: Merge Pcap and Pcapng format plugin based on EVF
>>>>  <https://issues.apache.org/jira/browse/DRILL-7826>
>>>> - DRILL-7828: Refactor Pcap and Pcapng format plugin
>>>> <https://issues.apache.org/jira/browse/DRILL-7828>
>>>>  - DRILL-7910: Bumps commons-io from 2.4 to 2.7
>>>>  <https://issues.apache.org/jira/browse/DRILL-7910>
>>>>  - DRILL-7901: Bump junit from 4.12 to 4.13.1
>>>>  <https://issues.apache.org/jira/browse/DRILL-7901>
>>>> 
>>>> I wanted to propose Monday May 10th to do the first release candidate,
>>> but
>>>> I have some concerns about some of the changes which may not be ready
>> by
>>>> then considering they seem to involve some level of effort and are in
>>> very
>>>> early stage: The LGTM alert changes and the StoragePluginStore model
>>>> change. JUnit version update might also become quite a large change if
>>>> instead of moving to 4.13.1, Drill is switching to JUnit5.
>>>> 
>>>> What do people think?
>>>> 
>>>> On Sat, Apr 24, 2021 at 1:00 PM Vitalii Diravka 
>>>> wrote:
>>>> 
>>>> Hi Laurent,
>>>> 
>>>> I want to include:
>>>> DRILL-7871 <https://issues.apache.org/jira/browse/DRILL-7871>
>> (preparing
>>>> PR)
>>>> DRILL-7908 <https://issues.apache.org/jira/browse/DRILL-7908>
>> (preparing
>>>> PR)
>>>> DRILL-7904 <https://issues.apache.org/jira/browse/DRILL-7904> (PR is
>>>> opened, in review)
>>>> DRILL-7828 <https://issues.apache.org/jira/browse/DRILL-7828> (PR is
>>>> opened, review is almost completed)
>>>> 
>>>> All these tasks are expected to be completed in a week
>>>> 
>>>> Kind regards
>>>> Vitalii
>>>> 
>>>> 
>>>> On Fri, Apr 23, 2021 at 9:25 PM Charles Givre 
>> wrote:
>>>> 
>>>> Hi Laurent,
>>>> We have a few PRs pending which I'd like to see in the next version
>>>> 
>>>> which
>>>> 
>>>> are:
>>>> 1.  The update(s) and bug fixes to the Mongo plugin.
>>>> 2.  There is an extended PR for bug fixes which clean up a lot of
>>>> 
>>>> alerts
>>>> 
>>>> generated by LGTM
>>>> 3.  There are a few other library updates which are pending.
>>>> 4.  We have some work which changes the access model around storage
>>>> plugins which would be good for this release
>>>> 5.  The PCAP/PCAP-NG consolidation is awaiting review.
>>>> 
>>>> I think that's it.
>>>> -- C
>>>> 
>>>> On Apr 22, 2021, at 12:33 PM, Laurent Goujon 
>>>> 
>>>> wrote:
>>>> 
>>>> 
>>>> Hello everyone,
>>>> 
>>>> It has been more than 6 months since the last release, and I believe
>>>> 
>>>> this
>>>> 
>>>> would be a good time to discuss the next one.
>>>> 
>>>> As mentioned in a previous email thread, I am volunteering to be the
>>>> release manager, and I'm looking forward  working with the whole
>>>> 
>>>> community
>>>> 
>>>> to make another great release.
>>>> 
>>>> We have around 80 changes in master since the last release, and there
>>>> 
>>>> are
>>>> 
>>>> several changes open for review too. It would be nice if people could
>>>> 
>>>> reply
>>>> 
>>>> to this email and share issues which should be part of that release,
>>>> 
>>>> so
>>>> 
>>>> we
>>>> 
>>>> can decide on an initial cut-off date.
>>>> 
>>>> Thanks in advance,
>>>> 
>>>> Laurent
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 



Re: known bug in csv header parsing

2021-05-22 Thread luoc
Hi Ted,
  You can use this reader without switching if you are using the latest version 
(1.19.0 for better). There are unit tests related to the compliant text reader 
(in `drill-java-exec` module, at the 
`org.apache.drill.exec.store.easy.text.compliant` package).

> 2021年5月23日 上午5:19,Ted Dunning  写道:
> 
> Also, where would I find the unit tests for the compliant text reader?
> 
> I have a simple enough case to write a unit test, but I can't see any
> reference to the class in question outside of working code.
> 
> 
> On Thu, May 20, 2021 at 7:40 AM Ted Dunning  wrote:
> 
>> 
>> Luoc,
>> 
>> How do I use the CompliantTextBatchReader?
>> 
>> How is the speed?
>> 
>> Can you point me at the old CSV reader? I am not sure where it is.
>> 
>> 
>> 
>> On Thu, May 20, 2021 at 1:09 AM luoc  wrote:
>> 
>>> Hello Ted,
>>> It's nice idea. I have done a quick review for the CSV reader, but not
>>> found any settings to process the errors. And then, We have refactored the
>>> CSV format using the EVF, please see the CompliantTextBatchReader.java
>>> (Complies with the RFC 4180 standard for text/csv files).
>>> 
>>>> 在 2021年5月20日,13:49,Ted Dunning  写道:
>>>> 
>>>> I have a csv file that causes an exception when read by Drill. The
>>> file is
>>>> slightly mal-formed (but R can read it).
>>>> 
>>>> Interestingly, if I don't parse the header line, I don't get the
>>> exception
>>>> and the problematic embedded quotes are handled well. Likewise, deleting
>>>> the first data line (which is well-formed) causes the exception to go
>>> away.
>>>> Deleting the second data line also causes the exception to stop. Fixing
>>> the
>>>> quoting of the included quotes also fixes the problem. Swapping the
>>> lines
>>>> works like deleting the first line. Repeating the first line after the
>>>> second line still gets the exception.
>>>> 
>>>> The file is this:
>>>> -
>>>> 
>>>> desc,name
>>>> 
>>>> "foo","x"
>>>> 
>>>> "manure called "foo"","y"
>>>> 
>>>> -
>>>> 
>>>> 
>>>> The exception is shown below. My thought is that if the CSV file is
>>>> considered mal-formed, we should get an error on the line that says
>>>> something along the lines of "mal-formed input". Even better would be to
>>>> allow such lines to be omitted (up to some sanity limit) or to parse it
>>>> correctly (which happens without headers being parsed).
>>>> 
>>>> Anybody have any thoughts?
>>>> 
>>>> Here is the R behavior (it omits the embedded quotes):
>>>> 
>>>>> f = read.csv("v.csv")
>>>> 
>>>>> f
>>>> 
>>>>  desc name
>>>> 
>>>> 1   foox
>>>> 
>>>> 2 manure called fooy
>>>> 
>>>> 
>>>> And here is the exception:
>>>> 
>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
>>>> NegativeArraySizeException Please, refer to logs for more information.
>>>> [Error Id: 7153f837-45eb-43d1-8e19-e3ca0197c61b ]
>>>> (java.lang.NegativeArraySizeException) null
>>>> org.apache.drill.exec.vector.VarCharVector$Accessor.get():487
>>>> org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():514
>>>> org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():475
>>>> org.apache.drill.exec.server.rest.WebUserConnection.sendData():147
>>>> org.apache.drill.exec.ops.AccountingUserConnection.sendData():42
>>>> 
>>> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():120
>>>> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>>>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
>>>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
>>>> java.security.AccessController.doPrivileged():-2
>>>> javax.security.auth.Subject.doAs():422
>>>> org.apache.hadoop.security.UserGroupInformation.doAs():1669
>>>> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
>>>> org.apache.drill.common.SelfCleaningRunnable.run():38
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>>>> java.lang.Thread.run():748
>>> 
>> 



Re: known bug in csv header parsing

2021-05-23 Thread luoc


  Nice. Powerful of Apache Drill.

> 2021年5月23日 上午10:18,Ted Dunning  写道:
> 
> I was able to test using 1.18 and find that the problem is gone. I was
> unable to do a head to head test with 1.16, however, and couldn't figure
> out how to run 1.18 on the same machines as the current 1.16 environment
> without destablizing that 1.16 environment (collision on the plugins
> directory). I didn't want to spend a lot of time so I will stick with the
> judgment that the current behavior seems to be correct.
> 
> Notably, the nested quotes are handled correctly without any quoting.
> 
> Nice.
> 
> On Sat, May 22, 2021 at 6:45 PM luoc  wrote:
> 
>> Hi Ted,
>>  You can use this reader without switching if you are using the latest
>> version (1.19.0 for better). There are unit tests related to the compliant
>> text reader (in `drill-java-exec` module, at the
>> `org.apache.drill.exec.store.easy.text.compliant` package).
>> 
>>> 2021年5月23日 上午5:19,Ted Dunning  写道:
>>> 
>>> Also, where would I find the unit tests for the compliant text reader?
>>> 
>>> I have a simple enough case to write a unit test, but I can't see any
>>> reference to the class in question outside of working code.
>>> 
>>> 
>>> On Thu, May 20, 2021 at 7:40 AM Ted Dunning 
>> wrote:
>>> 
>>>> 
>>>> Luoc,
>>>> 
>>>> How do I use the CompliantTextBatchReader?
>>>> 
>>>> How is the speed?
>>>> 
>>>> Can you point me at the old CSV reader? I am not sure where it is.
>>>> 
>>>> 
>>>> 
>>>> On Thu, May 20, 2021 at 1:09 AM luoc  wrote:
>>>> 
>>>>> Hello Ted,
>>>>> It's nice idea. I have done a quick review for the CSV reader, but not
>>>>> found any settings to process the errors. And then, We have refactored
>> the
>>>>> CSV format using the EVF, please see the CompliantTextBatchReader.java
>>>>> (Complies with the RFC 4180 standard for text/csv files).
>>>>> 
>>>>>> 在 2021年5月20日,13:49,Ted Dunning  写道:
>>>>>> 
>>>>>> I have a csv file that causes an exception when read by Drill. The
>>>>> file is
>>>>>> slightly mal-formed (but R can read it).
>>>>>> 
>>>>>> Interestingly, if I don't parse the header line, I don't get the
>>>>> exception
>>>>>> and the problematic embedded quotes are handled well. Likewise,
>> deleting
>>>>>> the first data line (which is well-formed) causes the exception to go
>>>>> away.
>>>>>> Deleting the second data line also causes the exception to stop.
>> Fixing
>>>>> the
>>>>>> quoting of the included quotes also fixes the problem. Swapping the
>>>>> lines
>>>>>> works like deleting the first line. Repeating the first line after the
>>>>>> second line still gets the exception.
>>>>>> 
>>>>>> The file is this:
>>>>>> -
>>>>>> 
>>>>>> desc,name
>>>>>> 
>>>>>> "foo","x"
>>>>>> 
>>>>>> "manure called "foo"","y"
>>>>>> 
>>>>>> -
>>>>>> 
>>>>>> 
>>>>>> The exception is shown below. My thought is that if the CSV file is
>>>>>> considered mal-formed, we should get an error on the line that says
>>>>>> something along the lines of "mal-formed input". Even better would be
>> to
>>>>>> allow such lines to be omitted (up to some sanity limit) or to parse
>> it
>>>>>> correctly (which happens without headers being parsed).
>>>>>> 
>>>>>> Anybody have any thoughts?
>>>>>> 
>>>>>> Here is the R behavior (it omits the embedded quotes):
>>>>>> 
>>>>>>> f = read.csv("v.csv")
>>>>>> 
>>>>>>> f
>>>>>> 
>>>>>> desc name
>>>>>> 
>>>>>> 1   foox
>>>>>> 
>>>>>> 2 manure called fooy
>>>>>> 
>>>>>> 
>>>>>> And here is the exception:
>>>>>> 
>>>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
>>>>>> NegativeArraySizeException Please, refer to logs for more information.
>>>>>> [Error Id: 7153f837-45eb-43d1-8e19-e3ca0197c61b ]
>>>>>> (java.lang.NegativeArraySizeException) null
>>>>>> org.apache.drill.exec.vector.VarCharVector$Accessor.get():487
>>>>>> org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():514
>>>>>> org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():475
>>>>>> org.apache.drill.exec.server.rest.WebUserConnection.sendData():147
>>>>>> org.apache.drill.exec.ops.AccountingUserConnection.sendData():42
>>>>>> 
>>>>> 
>> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():120
>>>>>> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>>>>>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
>>>>>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
>>>>>> java.security.AccessController.doPrivileged():-2
>>>>>> javax.security.auth.Subject.doAs():422
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs():1669
>>>>>> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
>>>>>> org.apache.drill.common.SelfCleaningRunnable.run():38
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>>>>>> java.lang.Thread.run():748
>>>>> 
>>>> 
>> 
>> 



Re: [DISCUSS] Drill 1.19.0 release

2021-05-23 Thread luoc
Hi Charles,
  All right, we'll be expecting the update.

> 2021年5月24日 上午12:13,Charles Givre  写道:
> 
> Hi Luoc, 
> We still have a few PRs pending that we really should get into Drill 1.19.  
> The main one is the junit upgrade.  There are a few critical CVEs associated 
> with that, so I do think it is important to get that one merged.  I think 
> Vitalii will have that one done in short order. 
> Best,
> -- C 
> 
> 
> 
>> On May 22, 2021, at 5:16 AM, luoc  wrote:
>> 
>> Hi Laurent,
>> It’s time to do a release with 1.19.0.
>> 
>>> 2021年5月19日 上午2:20,Vitalii Diravka  写道:
>>> 
>>> Hi Laurent,
>>> DRILL-7871 requires additional time to be introduced and it is better to
>>> include it for the next release.
>>> DRILL-7904 is updated, I think it will be merged in a few days. But it
>>> doesn't matter whether it is included in this release or in the next one.
>>> 
>>> So we can plan to start the release process
>>> 
>>> 
>>> Kind regards
>>> Vitalii
>>> 
>>> 
>>> On Tue, May 11, 2021 at 7:52 PM Laurent Goujon  wrote:
>>> 
>>>> Thanks Vitalii
>>>> 
>>>> On Tue, May 11, 2021 at 9:29 AM Vitalii Diravka 
>>>> wrote:
>>>> 
>>>>> Hi Luoc!
>>>>> 
>>>>> They are almost ready. I plan to update PR for them today.
>>>>> 
>>>>> Kind regards
>>>>> Vitalii
>>>>> 
>>>>> 
>>>>> On Sat, May 8, 2021 at 5:26 PM luoc  wrote:
>>>>> 
>>>>>> Hi Vitalii,
>>>>>> Would you mind sharing that... Is DRILL-7904 ready to review again?
>>>>> And what’s
>>>>>> the status on the DRILL-7871? thanks
>>>>>> 
>>>>>> 2021年5月4日 下午1:10,Ted Dunning  写道:
>>>>>> 
>>>>>> Laurent,
>>>>>> 
>>>>>> I don't have a stake here, so can't really comment about specifics, but
>>>>> the
>>>>>> process is looking good.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, May 3, 2021 at 9:23 PM Laurent Goujon 
>>>>> wrote:
>>>>>> 
>>>>>> Thanks for all the answers
>>>>>> 
>>>>>> So the issues I found based on the feedback are:
>>>>>> 
>>>>>> - DRILL-7878: Fix LGTM Alerts
>>>>>> <https://issues.apache.org/jira/browse/DRILL-7878>
>>>>>> - DRILL-7871: StoragePluginStore instances for different users
>>>>>> <https://issues.apache.org/jira/browse/DRILL-7871>
>>>>>> - DRILL-7908: Fix GitHub Actions CI
>>>>>> <https://issues.apache.org/jira/browse/DRILL-7908>
>>>>>> - DRILL-7904: Update to 30-jre Guava version
>>>>>> <https://issues.apache.org/jira/browse/DRILL-7904>
>>>>>> - DRILL-7826: Merge Pcap and Pcapng format plugin based on EVF
>>>>>> <https://issues.apache.org/jira/browse/DRILL-7826>
>>>>>>   - DRILL-7828: Refactor Pcap and Pcapng format plugin
>>>>>>   <https://issues.apache.org/jira/browse/DRILL-7828>
>>>>>> - DRILL-7910: Bumps commons-io from 2.4 to 2.7
>>>>>> <https://issues.apache.org/jira/browse/DRILL-7910>
>>>>>> - DRILL-7901: Bump junit from 4.12 to 4.13.1
>>>>>> <https://issues.apache.org/jira/browse/DRILL-7901>
>>>>>> 
>>>>>> I wanted to propose Monday May 10th to do the first release candidate,
>>>>> but
>>>>>> I have some concerns about some of the changes which may not be ready
>>>> by
>>>>>> then considering they seem to involve some level of effort and are in
>>>>> very
>>>>>> early stage: The LGTM alert changes and the StoragePluginStore model
>>>>>> change. JUnit version update might also become quite a large change if
>>>>>> instead of moving to 4.13.1, Drill is switching to JUnit5.
>>>>>> 
>>>>>> What do people think?
>>>>>> 
>>>>>> On Sat, Apr 24, 2021 at 1:00 PM Vitalii Diravka 
>>>>>> wrote:
>>>>>> 
>>>>>> Hi Laurent,
>>>>>> 
>>>>>> I want to include:
>>>>>> 

Re: Release and GPG key

2021-05-24 Thread luoc
Hi guys,
  Please let me know if you need assistance.

> 在 2021年5月25日,08:20,Ted Dunning  写道:
> 
> I would be happy to do this. My old Apache key is still live, but it isn't
> in the KEYS file yet. I can add it easily enough.
> 
> One quick note. The fact that a key is in the KEYS file is enough of a web
> of trust in Apache. This is because only a committer can put it there.
> There is a further cross check with the SVN file.
> 
> It is a very nice thing to do, however, to cross-sign keys. It is also a
> very tricky thing to do during COVID times.
> 
> I will go ahead and cross sign Laurent's key once we have the phone call so
> that we have a bit of traceability this time.
> 
> 
> 
> 
>> On Mon, May 24, 2021 at 4:45 PM Laurent Goujon  wrote:
>> 
>> Yes, I was thinking of doing a zoom meeting where I would show proof of id
>> + key id. Especially because of Covid, that seems the easiest option.
>> 
>>> On Mon, May 24, 2021, 16:08 Ted Dunning  wrote:
>>> 
>>> Laurent,
>>> 
>>> The critical question here is how you can substantiate this key. IN
>> person,
>>> with a government ID, this would be easy.
>>> 
>>> Do you know a committer personally who could vouch for you? Would you be
>>> interested in having a video call where you can present some ID?
>>> 
>>> On Mon, May 24, 2021 at 3:24 PM Laurent Goujon 
>> wrote:
>>> 
 Hi,
 
 I opened a pull request to add my public GPG keys to the KEYS file at
>> the
 root of the project:
 https://github.com/apache/drill/pull/2234
 
 Sadly this key is not part of the Web Of Trust, and I would need
>> someone
 part of it to validate my key. And also a PMC member to add it to the
>>> Drill
 release SVN repository.
 
 Anybody interested?
 
 Laurent
 
>>> 
>> 



Re: [DISCUSS] Drill 1.19.0 release

2021-05-28 Thread luoc
e
>>>>> time,
>>>>>> I think we held the window open for merging the changes for a very
>>> long
>>>>>> time. Unless there's objection, I'm planning to merge the Guava and
>>>>>> Jetty/Hadoop pull requests later today, and doing the first RC for
>>> Drill
>>>>>> 1.19.0
>>>>>> 
>>>>>> Here are the pull request links:
>>>>>> * https://github.com/apache/drill/pull/2202
>>>>>> * https://github.com/apache/drill/pull/2236
>>>>>> 
>>>>>> Laurent
>>>>>> 
>>>>>> 
>>>>>> On Wed, May 26, 2021 at 11:59 AM Laurent Goujon 
>>>>> wrote:
>>>>>> 
>>>>>>> After several retries, the Guava checks successfully passed:
>>>>>>> https://github.com/apache/drill/pull/2202
>>>>>>> 
>>>>>>> Charles, can we proceed on merging your change?
>>>>>>> 
>>>>>>> Laurent
>>>>>>> 
>>>>>>> On Tue, May 25, 2021 at 10:24 PM Laurent Goujon 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Just an update. There's a patch for updating both Jetty and Hadoop
>>> (at
>>>>>>>> the same time) as those changes are co-dependent:
>>>>>>>> https://github.com/apache/drill/pull/2236
>>>>>>>> 
>>>>>>>> As for the Guava patch, I'd be happy to help, but I'm not sure
>>> what's
>>>>>>>> left. As far as I can tell the shaded version of Guava has been
>>>>> updated,
>>>>>>>> but the build is failing. The security vulnerabilities for Guava are
>>>>>>>> moderate (and actually it seems a fix for CVE-2020-8908 would
>>> require a
>>>>>>>> code change instead of a Guava update.
>>>>>>>> 
>>>>>>>> Since this has been almost a month since we started this release
>>>>> process,
>>>>>>>> I wonder if we still want to wait on this patch, or if we should
>>> move
>>>>> it to
>>>>>>>> the next release.
>>>>>>>> 
>>>>>>>> Let me know what people think,
>>>>>>>> 
>>>>>>>> On Tue, May 25, 2021 at 8:24 AM Laurent Goujon 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Anything I can help with?
>>>>>>>>> 
>>>>>>>>> On Tue, May 25, 2021 at 7:02 AM Charles Givre 
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> HI Laurent,
>>>>>>>>>> My apologies.  I said Junit, when I was meaning to say to the
>>> Guava
>>>>> PR (
>>>>>>>>>> https://github.com/apache/drill/pull/2202 <
>>>>>>>>>> https://github.com/apache/drill/pull/2202>).  I think this one is
>>>>>>>>>> almost done as well.
>>>>>>>>>> -- C
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On May 24, 2021, at 5:29 PM, Laurent Goujon 
>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Ok, I was hoping that some of the PRs could be merged, but if we
>>> are
>>>>>>>>>> in
>>>>>>>>>>> agreement, let's start the work :)
>>>>>>>>>>> 
>>>>>>>>>>> On Sun, May 23, 2021 at 6:52 PM luoc  wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi Charles,
>>>>>>>>>>>> All right, we'll be expecting the update.
>>>>>>>>>>>> 
>>>>>>>>>>>>> 2021年5月24日 上午12:13,Charles Givre  写道:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Luoc,
>>>>>>>>>>>>> We still have a few PRs pending that we really should get into
>>>>> Drill
>>>>>>&

Re: [NOTICE] Git web site publishing to be done via .asf.yaml only as of July 1st

2021-05-31 Thread luoc


Thanks, Ted. Actually, A developer has added  the .asf.yaml to `gh-pages` 
branch, But not yet deployed.

> 在 2021年5月31日,22:43,Ted Dunning  写道:
> 
> Drill is on this list.
> 
> I think that the fix is relatively trivial, but haven't examined it
> carefully.
> 
> -- Forwarded message -
> From: Daniel Gruno 
> Date: Mon, May 31, 2021 at 6:41 AM
> Subject: [NOTICE] Git web site publishing to be done via .asf.yaml only as
> of July 1st
> To: Users 
> 
> 
> TL;DR: if your project web site is kept in subversion, disregard this
> email please. If your project web site is using git, and you have not
> deployed it via .asf.yaml, you MUST switch before July 1st or risk your
> web site goes stale.
> 
> 
> 
> Dear Apache projects,
> In order to simplify our web site publishing services and improve
> self-serve for projects and stability of deployments, we will be turning
> off the old 'gitwcsub' method of publishing git web sites. As of this
> moment, this involves 120 web sites. All web sites should switch to our
> self-serve method of publishing via the .asf.yaml meta-file. We aim to
> turn off gitwcsub around July 1st.
> 
> 
> ## How to publish via .asf.yaml:
> Publishing via .asf.yaml is described at:
> https://s.apache.org/asfyamlpublishing
> You can also see an example .asf.yaml with publishing and staging
> profiles for our own infra web site at:
> https://github.com/apache/infrastructure-website/blob/asf-site/.asf.yaml
> 
> In short, one puts a file called .asf.yaml into the branch that needs to
> be published as the project's web site, with the following two-line
> content, in this case assuming the published branch is 'asf-site':
> 
> publish:
>   whoami: asf-site
> 
> 
> It is important to note that the .asf.yaml file MUST be present at the
> root of the file system in the branch you wish to publish. The 'whoami'
> parameter acts as a guard, ensure that only the intended branch is used
> for publishing.
> 
> 
> ## Is my project affected by this?
> The quickest way to check if you need to switch to a .asf.yaml approach
> is to check out site source page at
> https://infra-reports.apache.org/site-source/ - if your site is listed
> in yellow, you will need to switch. This page will also tell you which
> branch you are currently publishing as your web site. This is (should
> be) the branch that you must add a .asf.yaml meta file to.
> 
> The web site source list updates every hour. If your project site
> appears in green, you are already using .asf.yaml for publishing and do
> not need to make any changes.
> 
> 
> ## What happens if we miss the deadline?
> If you miss the deadline, don't fret. Your site will of course still
> remain online as is, but new updates will not appear till you
> create/edit the .asf.yaml and set up publishing.
> 
> 
> ## Who do we contact if we have questions?
> Please contact us at us...@infra.apache.org if you have any additional
> questions.
> 
> 
> With regards,
> Daniel on behalf of ASF Infra.
> 



Re: [Attn] Drill 1.19.0 release - master tree is frozen

2021-05-31 Thread luoc


Has it started, Laurent? I want to merge the last PR now.

> 2021年6月1日 下午12:25,Laurent Goujon  写道:
> 
> Hi,
> 
> In preparation for the 1.19.0 release, master tree is currently frozen
> until the release process is completed. For committers, until the release
> is over and Drill version is changed to 1.20.0-SNAPSHOT, please do not push
> any changes into Drill master.
> 
> Cheers,
> 
> Laurent



Re: [Attn] Drill 1.19.0 release - master tree is frozen

2021-06-01 Thread luoc


DRILL-7928, please merge it, thanks

> 2021年6月1日 下午11:29,Laurent Goujon  写道:
> 
> Technically, yes, it has started. Which PR? this wasn't send to the mailing
> list
> 
> On Mon, May 31, 2021 at 10:51 PM luoc  wrote:
> 
>> 
>> Has it started, Laurent? I want to merge the last PR now.
>> 
>>> 2021年6月1日 下午12:25,Laurent Goujon  写道:
>>> 
>>> Hi,
>>> 
>>> In preparation for the 1.19.0 release, master tree is currently frozen
>>> until the release process is completed. For committers, until the release
>>> is over and Drill version is changed to 1.20.0-SNAPSHOT, please do not
>> push
>>> any changes into Drill master.
>>> 
>>> Cheers,
>>> 
>>> Laurent
>> 
>> 



Re: [VOTE] Release Apache Drill 1.19.0 - RC0

2021-06-01 Thread luoc


VOTE +1

> 在 2021年6月2日,05:42,Laurent Goujon  写道:
> 
> Hi all,
> 
> I'd like to propose the first release candidate (RC0) of Apache Drill,
> version 1.19.0.
> The release candidate covers a total of 105 resolved JIRAs [1]. Thanks
> to everyone who contributed to this release.
> The tarball artifacts are hosted at [2] and the maven artifacts are
> hosted at [3].
> This release candidate is based on commit
> ad3f344ac21e0462aa82f51f648a21a0554cf368 located at [4].
> Please download and try out the release.
> 
> The vote ends at 5 PM UTC (9 AM PDT, 7 PM EET, 10:30 PM IST), June 4, 2021.
> 
> [ ] +1
> [ ] +0
> [ ] -1
> Here's my vote: +1
> Laurent
> 
> [1] 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12348331
> [2] https://home.apache.org/~laurent/drill/releases/1.19.0/rc0/
> [3] https://repository.apache.org/content/repositories/orgapachedrill-1083/
> [4] https://github.com/laurentgo/drill/commits/drill-1.19.0


Re: [VOTE] Release Apache Drill 1.19.0 - RC0

2021-06-03 Thread luoc


DRILL-7940, too

> 在 2021年6月3日,19:57,Charles Givre  写道:
> 
> -1 (Binding)
> 
> I'd agree with Nick.  Drill-7937 should be included in this release.
> -- C
> 
>> On Jun 2, 2021, at 9:25 AM, Nick Stenroos-Dam  wrote:
>> 
>> Vote -1
>> 
>> Can we please include  DRILL-7937



Re: [VOTE] Release Apache Drill 1.19.0 - RC0

2021-06-03 Thread luoc


The DRILL-7945 blocked the release. So, I'm ready to merge the DRILL-7937 and 
DRILL-7940 for bugfix.

> 在 2021年6月4日,01:15,Laurent Goujon  写道:
> 
> Hey guys,
> 
> Can we please stop changing the goal post again and again? The fact that
> some of those pull requests are ready to merge should not be the sole
> consideration when to do a next release candidate.
> 
> I've been asking several times on this mailing list about what we want to
> include or not, and we got an agreement several times about it, and several
> times we are now having this conversation.
> IMHO, I would not include DRILL-7941, DRILL-7942 and DRILL-7943: those are
> new enhancements impacting Drill tests (not even the main product) and I do
> not understand the rush in making them part of the release. Specifically
> for the JUnit 5 update, I think the change is misleading because it looks
> like it's only the introduction of JUnit5 in one test class and everything
> else still uses JUnit 4, so I would hardly call it an upgrade...
> 
> As for DRILL-7937 and DRILL-7940, the issues were open in the last 3 days
> ago, but they do not seem to be regressions since 1.18.0, just gaps in what
> Drill provides. Personally since we are this deep in the release, I would
> also skip these one too. But if people have more contexts on those, maybe
> we can agree they should be merged?
> 
> Laurent
> 
> 
>> On Thu, Jun 3, 2021 at 6:10 AM Charles Givre  wrote:
>> 
>> There are like 5 minor PRs that are approved and awaiting merge.  I'd vote
>> that we include them.  Specifically:
>> 
>> DRILL-7943: Update Hamcrest
>> DRILL-7942: Update Mockito
>> DRILL-7941: Update junit to 5.7.2
>> DRILL-7937:  Parquet decimal error
>> DRILL-7940: Fix Kafka Key
>> 
>> These are all approved and can be merged.
>> 
>> -- C
>> 
>>>> On Jun 3, 2021, at 9:01 AM, luoc  wrote:
>>> 
>>> 
>>> DRILL-7940, too
>>> 
>>>> 在 2021年6月3日,19:57,Charles Givre  写道:
>>>> 
>>>> -1 (Binding)
>>>> 
>>>> I'd agree with Nick.  Drill-7937 should be included in this release.
>>>> -- C
>>>> 
>>>>> On Jun 2, 2021, at 9:25 AM, Nick Stenroos-Dam  wrote:
>>>>> 
>>>>> Vote -1
>>>>> 
>>>>> Can we please include  DRILL-7937
>>> 
>> 
>> 



Re: [VOTE] Release Apache Drill 1.19.0 - RC0

2021-06-03 Thread luoc
Laurent,
  Thanks for doing this. RC0 is no longer eligible for the next step operation. 
It is a consensus that we cannot release a version with known issues (the pull 
request mark as `bug`). In fact, Drill's release process is not friendly, and 
we will put these discussion after the release. Now our focus is on preparing 
for RC1. BTW, You're doing great.

> 2021年6月4日 下午1:20,Laurent Goujon  写道:
> 
> You actually went ahead and merged those patches without waiting while I
> was hoping we could get some consensus first :(
> 
> Can I just ask you to please respect the effort I'm putting in following
> what I think is the release process? If people think I'm not following the
> proper steps or that I'm not doing a good job at doing it, I'll gladly
> accept feedback and will do my best to address it, but going over me isn't
> helping me or the future volunteers for the next releases which might be
> also wondering what's the release process should be.
> Meanwhile I'll wait to get a review for the DRILL-7945 patch fixing the
> Guava regression, and hopefully I should be able to do another release
> candidate tomorrow.
> 
> Laurent
> 
> On Thu, Jun 3, 2021 at 5:46 PM luoc  wrote:
> 
>> 
>> The DRILL-7945 blocked the release. So, I'm ready to merge the DRILL-7937
>> and DRILL-7940 for bugfix.
>> 
>>> 在 2021年6月4日,01:15,Laurent Goujon  写道:
>>> 
>>> Hey guys,
>>> 
>>> Can we please stop changing the goal post again and again? The fact that
>>> some of those pull requests are ready to merge should not be the sole
>>> consideration when to do a next release candidate.
>>> 
>>> I've been asking several times on this mailing list about what we want to
>>> include or not, and we got an agreement several times about it, and
>> several
>>> times we are now having this conversation.
>>> IMHO, I would not include DRILL-7941, DRILL-7942 and DRILL-7943: those
>> are
>>> new enhancements impacting Drill tests (not even the main product) and I
>> do
>>> not understand the rush in making them part of the release. Specifically
>>> for the JUnit 5 update, I think the change is misleading because it looks
>>> like it's only the introduction of JUnit5 in one test class and
>> everything
>>> else still uses JUnit 4, so I would hardly call it an upgrade...
>>> 
>>> As for DRILL-7937 and DRILL-7940, the issues were open in the last 3 days
>>> ago, but they do not seem to be regressions since 1.18.0, just gaps in
>> what
>>> Drill provides. Personally since we are this deep in the release, I would
>>> also skip these one too. But if people have more contexts on those, maybe
>>> we can agree they should be merged?
>>> 
>>> Laurent
>>> 
>>> 
>>>> On Thu, Jun 3, 2021 at 6:10 AM Charles Givre  wrote:
>>>> 
>>>> There are like 5 minor PRs that are approved and awaiting merge.  I'd
>> vote
>>>> that we include them.  Specifically:
>>>> 
>>>> DRILL-7943: Update Hamcrest
>>>> DRILL-7942: Update Mockito
>>>> DRILL-7941: Update junit to 5.7.2
>>>> DRILL-7937:  Parquet decimal error
>>>> DRILL-7940: Fix Kafka Key
>>>> 
>>>> These are all approved and can be merged.
>>>> 
>>>> -- C
>>>> 
>>>>>> On Jun 3, 2021, at 9:01 AM, luoc  wrote:
>>>>> 
>>>>> 
>>>>> DRILL-7940, too
>>>>> 
>>>>>> 在 2021年6月3日,19:57,Charles Givre  写道:
>>>>>> 
>>>>>> -1 (Binding)
>>>>>> 
>>>>>> I'd agree with Nick.  Drill-7937 should be included in this release.
>>>>>> -- C
>>>>>> 
>>>>>>> On Jun 2, 2021, at 9:25 AM, Nick Stenroos-Dam 
>> wrote:
>>>>>>> 
>>>>>>> Vote -1
>>>>>>> 
>>>>>>> Can we please include  DRILL-7937
>>>>> 
>>>> 
>>>> 
>> 
>> 



Re: [NOTICE] Git web site publishing to be done via .asf.yaml only as of July 1st

2021-06-04 Thread luoc


Hi Ted. All done, thanks again.

> 在 2021年5月31日,23:20,luoc  写道:
> 
> 
> Thanks, Ted. Actually, A developer has added  the .asf.yaml to `gh-pages` 
> branch, But not yet deployed.
> 
>> 在 2021年5月31日,22:43,Ted Dunning  写道:
>> 
>> Drill is on this list.
>> 
>> I think that the fix is relatively trivial, but haven't examined it
>> carefully.
>> 
>> -- Forwarded message -
>> From: Daniel Gruno 
>> Date: Mon, May 31, 2021 at 6:41 AM
>> Subject: [NOTICE] Git web site publishing to be done via .asf.yaml only as
>> of July 1st
>> To: Users 
>> 
>> 
>> TL;DR: if your project web site is kept in subversion, disregard this
>> email please. If your project web site is using git, and you have not
>> deployed it via .asf.yaml, you MUST switch before July 1st or risk your
>> web site goes stale.
>> 
>> 
>> 
>> Dear Apache projects,
>> In order to simplify our web site publishing services and improve
>> self-serve for projects and stability of deployments, we will be turning
>> off the old 'gitwcsub' method of publishing git web sites. As of this
>> moment, this involves 120 web sites. All web sites should switch to our
>> self-serve method of publishing via the .asf.yaml meta-file. We aim to
>> turn off gitwcsub around July 1st.
>> 
>> 
>> ## How to publish via .asf.yaml:
>> Publishing via .asf.yaml is described at:
>> https://s.apache.org/asfyamlpublishing
>> You can also see an example .asf.yaml with publishing and staging
>> profiles for our own infra web site at:
>> https://github.com/apache/infrastructure-website/blob/asf-site/.asf.yaml
>> 
>> In short, one puts a file called .asf.yaml into the branch that needs to
>> be published as the project's web site, with the following two-line
>> content, in this case assuming the published branch is 'asf-site':
>> 
>> publish:
>>  whoami: asf-site
>> 
>> 
>> It is important to note that the .asf.yaml file MUST be present at the
>> root of the file system in the branch you wish to publish. The 'whoami'
>> parameter acts as a guard, ensure that only the intended branch is used
>> for publishing.
>> 
>> 
>> ## Is my project affected by this?
>> The quickest way to check if you need to switch to a .asf.yaml approach
>> is to check out site source page at
>> https://infra-reports.apache.org/site-source/ - if your site is listed
>> in yellow, you will need to switch. This page will also tell you which
>> branch you are currently publishing as your web site. This is (should
>> be) the branch that you must add a .asf.yaml meta file to.
>> 
>> The web site source list updates every hour. If your project site
>> appears in green, you are already using .asf.yaml for publishing and do
>> not need to make any changes.
>> 
>> 
>> ## What happens if we miss the deadline?
>> If you miss the deadline, don't fret. Your site will of course still
>> remain online as is, but new updates will not appear till you
>> create/edit the .asf.yaml and set up publishing.
>> 
>> 
>> ## Who do we contact if we have questions?
>> Please contact us at us...@infra.apache.org if you have any additional
>> questions.
>> 
>> 
>> With regards,
>> Daniel on behalf of ASF Infra.
>> 



Re: [RESULT] [VOTE] Release Apache Drill 1.19.0 RC1

2021-06-10 Thread luoc
Hi Charles,
  Can we skip the 2nd item on the to-do list. I think we cannot contact the 
author of Twitter account now. And then, Could you please help to work on the 
1st and 3rd item? James and I are doing a review for the update of docs.

> 在 2021年6月10日,15:42,Laurent Goujon  写道:
> 
> Thanks Vova.
> 
> I checked that the docker image and the maven artifacts have been published.
> 
> I also created a pull request for updating the website:
> https://github.com/apache/drill/pull/2257/commits
> It contains the javadoc changes, but the commits are separated so it should
> be easy for people to review the changes and the blog announcement. Please
> let me know if I forgot an important feature.
> 
> Still looking at the release documentation (
> https://github.com/apache/drill/blob/master/docs/dev/Release.md), there are
> a couple of things it seems I won't be able to do but would expect a PMC to
> do:
> * Updating the current release date in JIRA and creating a new release
> * Post the announcement on Twitter
> * Post the release date on https://reporter.apache.org/addrelease.html?drill
> 
> As for the website update, can someone confirm that those instructions at
> https://github.com/apache/drill/blob/gh-pages/README.md#uploading-to-the-apache-website-drill-committers-only
> are still correct?
> 
> Laurent
> 
> 
>> On Wed, Jun 9, 2021 at 2:44 PM Vova Vysotskyi  wrote:
>> 
>> Good news, that issue was resolved so I have published artifacts:
>> https://dist.apache.org/repos/dist/release/drill/drill-1.19.0/.
>> 
>> We will have to update the release instruction since now it would work
>> only with moving artifacts from dist/dev to dist/release
>> 
>> Kind regards,
>> Volodymyr Vysotskyi
>> 
>>> On 2021/06/09 20:16:12, Ted Dunning  wrote:
>>> Thanks.>
>>> 
>>> I figured you would be ahead of me on this.>
>>> 
>>> On Wed, Jun 9, 2021 at 12:27 PM  wrote:>
>>> 
 Hello Ted,>
> 
 Yes, initially I tried both options.>
 I have also left a comment on the ticket, hope it will be resolved
>> soon.>
> 
 Kind regards,>
 Volodymyr Vysotskyi>
> 
 On 2021/06/09 19:04:02, Ted Dunning  wrote:>
> Vova,>>
>> 
> Gavin responded on INFRA-21981 to the effect that upload should go
>> to>
 the>>
> dev side and then svn mv should be used to move to the release
>> side.>>
>> 
> Is that what you tried to do?>>
>> 
>> 
>> 
> On Wed, Jun 9, 2021 at 10:25 AM  wrote:>>
>> 
>> I have some issues, will deploy after>>
>> https://issues.apache.org/jira/browse/INFRA-21981 is fixed.>>
 
>> On 2021/06/09 16:27:12, vo...@apache.org wrote:>>
>>> Hello Laurent,>>>
> 
>>> I’ll publish them later today.>>>
> 
>>> Kind regards,>>>
>>> Volodymyr Vysotskyi>>>
> 
> 
>>> On 2021/06/09 04:39:50, Laurent Goujon 
>> wrote: >>>
 Hi,> >>>
>>> 
 May I kindly ask for a PMC to push the RC1 artifacts to the
>> dist>>
>> repository> >>>
 per instructions at> >>>
 
>> https://github.com/apache/drill/blob/master/docs/dev/Release.md?>>
>>> 
>>> 
 The artifacts are available at> >>>
 https://home.apache.org/~laurent/drill/releases/1.19.0/rc1/>
> 
>>> 
 Laurent> >>>
>>> 
 On Tue, Jun 8, 2021 at 9:36 PM Laurent Goujon <
>> la...@dremio.com>>>
>> wrote:> >>>
>>> 
> Hi all,> >>>
> 
> The vote passes. Thanks to everyone who has tested the
>> release>>
>>> 
> 
>>> 
>>> 
> candidate and given their comments and votes. Final tally:>
> 
> 
> 3x +1 (binding): Laurent, Ted, Vova> >>>
> 
> No 0s or -1s.> >>>
> 
> I'll start the process for pushing the release artifacts
>> and>
 send>>
>> an> >>>
> announcement once propagated.> >>>
> 
> Kind regards,> >>>
> 
> Laurent> >>>
> 
>>> 
>> 
>>> 
> 



Re: Multilingual support for the documentation

2021-06-22 Thread luoc
+1 
That would be a great start.

> 在 2021年6月22日,21:31,James Turton  写道:
> 
> Hi all
> 
> Based on an initiative of Cong Luo's, I've just implemented support for doc 
> pages in multiple languages on the Drill website.  There are currently a 
> handful of farcically translated pages in Simplified Chinese which I put 
> there for demonstration purposes and which can be found by using the new 
> Language selector in the top menu.
> 
> Instructions for adding translated pages have been added to README.md in the 
> gh-pages branch.  Now all that's missing is an army of translators so please 
> encourage your multilingual friends.  With any luck this will remove a 
> barrier to entry for many prospective Drill users.
> 
> Regards
> James



Re: [jira] [Created] (DRILL-7965) mercari fees Technology Trade Show

2021-07-02 Thread luoc


James, Is there a way to let it go to Mars?

> 在 2021年7月2日,21:03,James Turton  写道:
> 
> First time I've seen spam planted in JIRA tickets.  Welcome to the future 🤮.
> 
>> On 2021/07/02 14:19, Cheryl Valentine (Jira) wrote:
>> Cheryl Valentine created DRILL-7965:
>> ---
>> 
>>  Summary: mercari fees Technology Trade Show
>>  Key: DRILL-7965
>>  URL:https://issues.apache.org/jira/browse/DRILL-7965
>>  Project: Apache Drill
>>   Issue Type: Bug
>> Reporter: Cheryl Valentine



Re: Parquet compression codecs and bundling

2021-07-13 Thread luoc


James,
  Good question. I prefer to keep the out-of-box feature for Drill users. 
Actually, the point is that the license is accepted (under the Apache License).

> 在 2021年7月9日,17:57,James Turton  写道:
> 
> Hi
> 
> I'm looking for advice on a "to bundle or not to bundle" question for a PR 
> I'm working on which enables the reading and writing of all of the 
> compression codecs standardised for Parquet.  That amounts to adding support 
> for LZO, LZ4, Brotli and Zstandard.
> 
> Apart from some minor code changes in Drill in itself, users will obviously 
> also need implementations of each codec and we don't currently bundle all of 
> the aforementioned.  In cases where native codec libs are involved then I 
> guess platform specifics would become a consideration but let's gloss over 
> that for now.
> 
> In the case of LZO I believe that a GPL license applies and I don't think it 
> can ever be bundled (but we can still enable it and provide instructions for 
> users to add it to their installations themselves).  In the case of Brotli 
> there is an Apache-licensed implementation that we can bundle if we don't 
> mind adding a 750KB JAR file.
> 
> So my question is: should I bundle all of the codecs that I can, making 
> things work out of the box but adding to the size of the distributable?  Or 
> should I put in documentation and error messages that instruct users to get 
> the codecs themselves instead?
> 
> Thanks
> James



Re: HBase Connectivity with Drill

2021-07-26 Thread luoc

Hi Ramu,
  I'm not using the drill with kerberos now. But I found drill users 
use the hive with krb on that :


|{ "type": "hive", "enabled": true, "configProps": { 
"hive.metastore.uris": "thrift://:9083", 
"fs.default.name": "hdfs:///", "hive.server2.enable.doAs": "false", 
"hive.metastore.sasl.enabled": "true", 
"hive.metastore.kerberos.principal": "" } }|


  I think the HBase connector is maybe :

||{
  "config": {
    "hbase.client.keytab.file" : "",
    "hbase.client.kerberos.principal" : ""
  }
}

  Actually, HBase storage put all the configuration (for the client 
setting) key-value pairs on the `config` field. In addition, this doc is 
related to the kerberos (with drill). Configuring Kerberos Security 



在 2021/7/25 下午6:46, Banda, Ramu 写道:

Hi Team,
We have a question regarding HBase connectivity using Drill, hope you guys help 
us.
We are using Hortonworks Hadoop cluster as a Big Data platform and we got HBase 
as one of the component with in this cluster. Also, we are setting up Apache 
Drill outside of Hadoop cluster.
Now, we would like to query HBase using Apache Drill, we know that this 
solution will work. And, we have used below plugin information.

The question here, HBase plugin does not have username& password , how does it 
authenticate HBase which is outside of Apache Drill host machine?
Note: Hadoop cluster is authenticated through edge node with Kerberos mechanism.
HBase Plugin Information:
{
   "type": "hbase",
   "config": {
 "hbase.zookeeper.quorum": "10.10.100.62,10.10.10.52,10.10.10.53",
 "hbase.zookeeper.property.clientPort": "2181"
   },
   "size.calculator.enabled": false,
   "enabled": true
 }

Thanks,
Ramu





Confidential communication
Westpac Banking Corporation (ABN 33 007 457 141, AFSL 233714)
Westpac Institutional Bank is a division of Westpac Banking Corporation




Re: Strange query crash

2021-08-11 Thread luoc
Hello Ted,
  I think the error stack looks can be deceiving. There is a ticket (DRILL-4254 
) related to the issue. I 
recommend that you upgrade to latest version if is cause by the schema change.

> 2021年8月11日 上午4:18,Ted Dunning  写道:
> 
> 
> I am running a moderate sized data reduction task and getting strange crash 
> with Drill 1.16.  Stack trace is shown below.
> 
> The query is this:
> 
> ```
> create table dfs.home.`mrms/grib-07.parquet`
> partition by (box)
> as 
> with
> t1 as (
>select value as precip, datetime as t, cast(latitude as double) as 
> latitude, cast(longitude as double) longitude
>from table(dfs.home.`mrms/*grib*csv`(type => 'text', fieldDelimiter => 
> ',', extractHeader => true))
>limit 4)
> 
> select precip, latitude, longitude, floor(latitude)*100 - floor(longitude) box
> from t1
> order by box, latitude, longitude, t
> ```
> 
> The basic idea is that we are scanning 740 CSV files containing about 19GB of 
> data and I want to write them to a partitioned parquet dataset. I am 
> progressively increasing the number of lines processed to verify things are 
> working. The process worked fine at 200M rows of data and fails at 400M. The 
> text of the error is disconcerting because it claims that there is an index 
> error, but the index given is in the specified range.
> 
> Does anybody have any ideas on this? I haven't tried more recent versions.
> 
> 
> Fragment 3:0
> 
> Please, refer to logs for more information.
> 
> [Error Id: e681aca3-78b7-496a-9af1-7ec34fcf31a9 on nodec:31010]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:630)
>  ~[drill-common-1.16.0.10-mapr.jar:1.16.0.10-mapr]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:363)
>  [drill-java-exec-1.16.0.10-mapr.jar:1.16.0.10-mapr]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:219)
>  [drill-java-exec-1.16.0.10-mapr.jar:1.16.0.10-mapr]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:329)
>  [drill-java-exec-1.16.0.10-mapr.jar:1.16.0.10-mapr]
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.16.0.10-mapr.jar:1.16.0.10-mapr]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_292]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_292]
>   at java.lang.Thread.run(Thread.java:748) [na:1.8.0_292]
> Caused by: java.lang.IllegalStateException: 
> java.lang.IndexOutOfBoundsException: index: 131071, length: 19 (expected: 
> range(0, 131072))
>   at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.doWork(RemovingRecordBatch.java:69)
>  ~[drill-java-exec-1.16.0.10-mapr.jar:1.16.0.10-mapr]
>   at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:117)
>  ~[drill-java-exec-1.16.0.10-mapr.jar:1.16.0.10-mapr]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  ~[drill-java-exec-1.16.0.10-mapr.jar:1.16.0.10-mapr]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
>  ~[drill-java-exec-1.16.0.10-mapr.jar:1.16.0.10-mapr]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
>  ~[drill-java-exec-1.16.0.10-mapr.jar:1.16.0.10-mapr]
>   at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
>  ~[drill-java-exec-1.16.0.10-mapr.jar:1.16.0.10-mapr]
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:141)
>  ~[drill-java-exec-1.16.0.10-mapr.jar:1.16.0.10-mapr]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  ~[drill-java-exec-1.16.0.10-mapr.jar:1.16.0.10-mapr]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
>  ~[drill-java-exec-1.16.0.10-mapr.jar:1.16.0.10-mapr]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
>  ~[drill-java-exec-1.16.0.10-mapr.jar:1.16.0.10-mapr]
>   at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
>  ~[drill-java-exec-1.16.0.10-mapr.jar:1.16.0.10-mapr]
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:141)
>  ~[drill-java-exec-1.16.0.10-mapr.jar:1.16.0.10-mapr]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  ~[drill-java-exec-1.16.0.10-mapr.jar:1.16.0.10-mapr]
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
>

Query the HBase data in Drill

2021-08-24 Thread luoc
Hello Guys,
  Will you use Drill to query Apache HBase? If so, what new feature would you 
like to see in HBase storage plugin? In addition, Drill supported the Apache 
Cassandra since 1.19.
Absolutely… Could you tell me what your most common storage plugin (or data 
format) are? Thanks for your time.


-- luoc

Re: Query the HBase data in Drill

2021-08-25 Thread luoc
  Thanks for the feedback. Apache HBase and Apache Phoenix are an important 
part of my work. And then, I'm not sure anyone have started the `HBase to EVF` 
for Drill, but this improvement is valuable.
  In particular, I found a big improvement over the Phoenix 4.x and HBase 1.x 
series when I recently used the Phoenix 5.1 + HBase 2.3 on Hadoop 3.3.
  Look forward to seeing Drill inherit from these advantages.

> 在 2021年8月24日,23:16,Ted Dunning  写道:
> 
> I know somebody who is querying a very large table and has trouble with
> pushdown.
> 
> They are looking for values indexed by primary key with a query like
> "select * from table where key in s".  If s has a very small number of
> values, this turns into primary key access, but if there are more than just
> a few, it becomes a scan.
> 
> The situation that would be interesting to detect is where s has a few
> tightly clustered groups. The ideal strategy would be to scan each group.
> How this might be detected isn't clear to me, but it would make a massive
> difference to this kind of query.
> 
> Currently, the best alternative is to try to avoid this kind of query and
> build a data flow such that each cluster of keys flows into a separate
> query. This would be made easier if a common table expression (CTE) query
> could be done without having the optimizer try to globally optimize back to
> a single big scan.
> 
> Anyway, I have absolutely no concrete suggestions for making this work, but
> the need is there.
> 
> 
>> On Tue, Aug 24, 2021 at 4:39 AM luoc  wrote:
>> 
>> Hello Guys,
>>  Will you use Drill to query Apache HBase? If so, what new feature would
>> you like to see in HBase storage plugin? In addition, Drill supported the
>> Apache Cassandra since 1.19.
>> Absolutely… Could you tell me what your most common storage plugin (or
>> data format) are? Thanks for your time.
>> 
>> 
>> -- luoc



Re: [NOTICE] Git web site publishing to be done via .asf.yaml only as of July 1st

2021-08-27 Thread luoc


Got it. Thank you, James.
@kingswanwho Any questions?

> 在 2021年8月27日,18:50,James Turton  写道:
> 
> On the documentation side of things, introducing .asf.yaml enabled some 
> improvements and better alignment with what other projects do. There are new 
> instructions in drill-site/master/README.md but here's the need-to-know.
> 
> 1. The branch drill/gh-pages is now obsolete.  It's full history has been 
> imported to drill-site/master.  The apache/drill repo should not receive 
> website artefacts at all any more.  After a couple of stable months we can 
> optionally clean up by deleting the drill/gh-pages branch.
> 2. Github Pages (http://apache.github.io/drill/) is out of the picture too 
> now.
> 3. There's a CI build run by ASF infrastructure when new Markdown docs are 
> pushed to master or staging in drill-site.  This generates a new site and 
> deploys it to https://drill.apache.org or https://drill.staged.apache.org 
> respectively.
> 4. Because of 3 there's mostly no need to install or run Jekyll locally any 
> more but you still can in much the same way as before, just simplified.  See 
> README.me in drill-site for more info.
> 
> Regards
> James
> 
>> On 2021/05/31 16:43, Ted Dunning wrote:
>> Drill is on this list.
>> 
>> I think that the fix is relatively trivial, but haven't examined it
>> carefully.
>> 
>> -- Forwarded message -
>> From: Daniel Gruno 
>> Date: Mon, May 31, 2021 at 6:41 AM
>> Subject: [NOTICE] Git web site publishing to be done via .asf.yaml only as
>> of July 1st
>> To: Users 
>> 
>> 
>> TL;DR: if your project web site is kept in subversion, disregard this
>> email please. If your project web site is using git, and you have not
>> deployed it via .asf.yaml, you MUST switch before July 1st or risk your
>> web site goes stale.
>> 
>> 
>> 
>> Dear Apache projects,
>> In order to simplify our web site publishing services and improve
>> self-serve for projects and stability of deployments, we will be turning
>> off the old 'gitwcsub' method of publishing git web sites. As of this
>> moment, this involves 120 web sites. All web sites should switch to our
>> self-serve method of publishing via the .asf.yaml meta-file. We aim to
>> turn off gitwcsub around July 1st.
>> 
>> 
>> ## How to publish via .asf.yaml:
>> Publishing via .asf.yaml is described at:
>> https://s.apache.org/asfyamlpublishing
>> You can also see an example .asf.yaml with publishing and staging
>> profiles for our own infra web site at:
>> https://github.com/apache/infrastructure-website/blob/asf-site/.asf.yaml
>> 
>> In short, one puts a file called .asf.yaml into the branch that needs to
>> be published as the project's web site, with the following two-line
>> content, in this case assuming the published branch is 'asf-site':
>> 
>> publish:
>>whoami: asf-site
>> 
>> 
>> It is important to note that the .asf.yaml file MUST be present at the
>> root of the file system in the branch you wish to publish. The 'whoami'
>> parameter acts as a guard, ensure that only the intended branch is used
>> for publishing.
>> 
>> 
>> ## Is my project affected by this?
>> The quickest way to check if you need to switch to a .asf.yaml approach
>> is to check out site source page at
>> https://infra-reports.apache.org/site-source/ - if your site is listed
>> in yellow, you will need to switch. This page will also tell you which
>> branch you are currently publishing as your web site. This is (should
>> be) the branch that you must add a .asf.yaml meta file to.
>> 
>> The web site source list updates every hour. If your project site
>> appears in green, you are already using .asf.yaml for publishing and do
>> not need to make any changes.
>> 
>> 
>> ## What happens if we miss the deadline?
>> If you miss the deadline, don't fret. Your site will of course still
>> remain online as is, but new updates will not appear till you
>> create/edit the .asf.yaml and set up publishing.
>> 
>> 
>> ## Who do we contact if we have questions?
>> Please contact us at us...@infra.apache.org if you have any additional
>> questions.
>> 
>> 
>> With regards,
>> Daniel on behalf of ASF Infra.
>> 



Re: [VOTE] Add LGTM to Drill Pull Requests

2021-08-30 Thread luoc


+1

> 在 2021年8月31日,05:23,Charles Givre  写道:
> 
> Hello Drill Devs, 
> I’d like to call a vote as to whether we add LGTM automated code check to our 
> pull requests.  This would not replace the  current review process, but 
> rather add a quality check to new code.  I seem to recall us voting on this 
> before, but I couldn’t find the email, so I apologize for the possible 
> duplicate vote. 
> 
> Thanks!
> — C


Drill at ApacheCon 2021

2021-09-03 Thread luoc
Hello Guys,

  Thank you for your interest in Apache Drill. There are good news for Drill 
community.

  1. Since 2021.02, the rapid growth in the number of users for Drill’s Slack 
channel [1], it's the sum of the number since 2019 (2x).

  2. Apache Drill come back the ApacheCon ! look forward to seeing you in the 
Apache Federated Data Track [2].

  3. Apache Drill + Apache Airflow is more powerful [3], please see also the 
Drill provider for Airflow [4].

  4. Multilingual support was added to the website, welcome to contribute the 
docs of local languages. Try it out !


[1] https://s.apache.org/m3r6y
[2] https://www.apachecon.com/acah2021/tracks/feddata.html
[3] https://drill.apache.org/docs/orchestrating-queries-with-airflow
[4] https://drill.apache.org/blog/2021/08/05/drill-provider-for-airflow


-- luoc

Re: New Docker images published automatically

2021-09-20 Thread luoc
Hello James,
  Great work. Is it possible to add this NOTICE to Github wiki or docs of 
website?

> 在 2021年9月20日,19:27,James Turton  写道:
> 
> Hi all
> 
> If you browse to https://hub.docker.com/r/apache/drill/tags, you'll see that 
> we've just started publishing the following new Docker images based on 
> snapshots of Drill master.
> 
> apache/drill:master-openjdk-8 (=master) snapshot of master running on the 
> openjdk:8 base image
> apache/drill:master-openjdk-11  snapshot of master running on the 
> latest supported LTS OpenJDK base image
> apache/drill:master-openjdk-14  snapshot of master running on the 
> latest supported OpenJDK base image
> 
> The latest *released* version of Drill, which remains recommended for 
> production deployments, is still
> 
> apache/drill:latest latest release running on the 
> openjdk:8 base image
> 
> Starting from the *next* release (1.20) we will also publish
> 
> apache/drill:latest-openjdk-8 (=latest) latest release running on the 
> openjdk:8 base image
> apache/drill:latest-openjdk-11  latest release running on the latest 
> supported LTS OpenJDK base image
> apache/drill:latest-openjdk-14  latest release running on the latest 
> supported OpenJDK base image
> 
> each of which will also be tagged by Drill version, so following tags will be 
> identical to those in the preceding paragraph
> 
> apache/drill:1.20.0-openjdk-8 (=latest) latest release running on the 
> openjdk:8 base image
> apache/drill:1.20.0-openjdk-11  latest release running on the latest 
> supported LTS OpenJDK base image
> apache/drill:1.20.0-openjdk-14  latest release running on the latest 
> supported OpenJDK base image
> 
> Coming back to what's different *today*, the short of it is that you have 
> containerised snapshots of master for testing unreleased code or newer JDK 
> images.
> 
> Regards
> James



Re: Parquet compression codecs

2021-09-29 Thread luoc


James, you are doing fine.
Is it possible to post a new blog in the website for this?

> 在 2021年9月29日,20:27,James Turton  写道:
> 
> Hi all
> 
> We've got support for reading and writing using additional Parquet 
> compression codecs in master now.  Here are the footprints of a 25M record 
> dataset compressed by Drill with different codecs.
> 
> | Codec  | Size on disk (Mb) |
> | -- | - |
> | brotli |   87  |
> | gzip   |   80  |
> | lz4|  100.6|
> | lzo|  100.8|
> | snappy |  192  |
> | zstd   |   85  |
> | none   | 2152  |
> 
> I haven't made measurements of (de)compression speed differences myself but 
> there are many such benchmarks around on the web, and the differences can be 
> big *if* you've got a workload that is CPU bound by (de)compression.  Beyond 
> that there are the usual considerations like better utilisation of the OS 
> page cache by the higher compression ratio codecs, less I/O when data must 
> come from disk, etc.  Zstd is probably the one I'll be putting into 
> `store.parquet.compression` myself at this point.
> 
> Happy Drilling!
> James



Re: [DISCUSS] Being less eager about outbound JDBC connections

2021-10-19 Thread luoc


  James, Is your idea related to the HikariCP pools ? What is difference JDBC 
connection of storage plugin that do not use the HikariCP ?

> 在 2021年10月19日,20:37,James Turton  写道:
> 
> HikariCP



Re: [DISCUSS] Being less eager about outbound JDBC connections

2021-10-20 Thread luoc


  James, thanks for the work you did.

> 在 2021年10月20日,17:23,James Turton  写道:
> 
> I went and looked at the other storage plugins.  Good news is that we were 
> already being lazy about connecting in 11 of the 13 plugins I tested, the 
> exceptions being storage-jdbc and storage-splunk. Storage-splunk connects 
> eagerly to fetch Splunk indexes and storage-jdbc connects eagerly x 10 
> because of HikariCP.  I've fixed both cases, meaning that now all plugins 
> (that I found) can be loaded even if the data source is not ready at the time.



Re: [DISCUSS] Delete apache/drill:gh-pages and apache/drill:gh-pages-master

2021-10-22 Thread luoc


  Thank you, James.

> 在 2021年10月22日,19:42,Charles Givre  写道:
> 
> +1. Thanks for taking care of this!
> 
>> On Oct 22, 2021, at 4:45 AM, James Turton  wrote:
>> 
>> Hi all
>> 
>> It's been a couple of months since we consolidated the Drill website in the 
>> apache/drill-site repo and we've been going along fine since then.  To 
>> remove duplication and confusion I now propose that we delete the old docs 
>> branches in the apache/drill repo. Specifically,
>> 
>> https://github.com/apache/drill/commits/gh-pages-master
>> 
>> which was abandoned in 2015 and
>> 
>> https://github.com/apache/drill/commits/gh-pages
>> 
>> which we abandoned two months ago.
>> 
>> Using the links to Github in this email you can compare the second branch 
>> above with our new branch
>> 
>> https://github.com/apache/drill-site/commits/master
>> 
>> to find that they share the commit 7f75838 from 23 August 2021.  If you 
>> additionally run `git log --reverse` on our new branch you'll see the first 
>> commit dates back to 2014.  These two observations assure us that we have 
>> migrated our full history and continue to append to it.
>> 
>> Branches can't be deleted with a PR, hence this email.  If you have any 
>> concerns please raise them in the next few days, otherwise I'll proceed.
>> 
>> Best wishes
>> James



Re: SQLLine

2021-10-29 Thread luoc


If we have a chance, I would also like to hear the principles and applications 
of Calcite.

> 在 2021年10月29日,16:12,James Turton  写道:
> 
> Just waiting for the join request to sqlline-dev to go through, but I'll 
> reproduce my response here for this community.
> 
> SQLLine is an important project that I've spotted in the wild in contexts 
> well removed from Apache databases and query engines.  E.g. not long ago I 
> found someone who'd shell scripted a surprising amount of ETL against MySQL 
> using SQLLine.  Regardless of opinions about whether he should have reached 
> for an ETL tool instead, the fact remains that he has a good deal of code in 
> production that relies on SQLLine, and it runs.
> 
> +1 for an Apache DB subproject and thank you Julian and Sergey for all of 
> your contributions to this tool.
> 
> 
>> On 2021/10/28 21:56, Julian Hyde wrote:
>> Drill developers,
>> 
>> The SQLLine community is having a conversation about project
>> governance, with options including moving to ASF or an ASF-like PMC
>> model. SQLLine has many users but very few active developers. If you
>> are a SQLLine user, then you are part of the community, and we would
>> like to hear from you. I encourage you to join the sqlline-dev list
>> [1] and chime in.
>> 
>> Julian
>> 
>> [1] https://groups.google.com/g/sqlline-dev



Re: SQLLine

2021-10-29 Thread luoc


 Thank you, Julian. I joined the sqlline-dev.

> 在 2021年10月30日,03:48,Julian Hyde  写道:
> 
> Luoc,
> 
> A good way to learn about Calcite is to watch the tutorial that
> Stamatis Zampetakis and I gave at VLDB in August [1]. It is 90 minutes
> but we cover a lot of ground, including building a DB.
> 
> Julian
> 
> [1] https://www.youtube.com/watch?v=meI0W12f_nw
> 
>> On Fri, Oct 29, 2021 at 2:28 AM luoc  wrote:
>> 
>> 
>> If we have a chance, I would also like to hear the principles and 
>> applications of Calcite.
>> 
>>>> 在 2021年10月29日,16:12,James Turton  写道:
>>> 
>>> Just waiting for the join request to sqlline-dev to go through, but I'll 
>>> reproduce my response here for this community.
>>> 
>>> SQLLine is an important project that I've spotted in the wild in contexts 
>>> well removed from Apache databases and query engines.  E.g. not long ago I 
>>> found someone who'd shell scripted a surprising amount of ETL against MySQL 
>>> using SQLLine.  Regardless of opinions about whether he should have reached 
>>> for an ETL tool instead, the fact remains that he has a good deal of code 
>>> in production that relies on SQLLine, and it runs.
>>> 
>>> +1 for an Apache DB subproject and thank you Julian and Sergey for all of 
>>> your contributions to this tool.
>>> 
>>> 
>>>> On 2021/10/28 21:56, Julian Hyde wrote:
>>>> Drill developers,
>>>> 
>>>> The SQLLine community is having a conversation about project
>>>> governance, with options including moving to ASF or an ASF-like PMC
>>>> model. SQLLine has many users but very few active developers. If you
>>>> are a SQLLine user, then you are part of the community, and we would
>>>> like to hear from you. I encourage you to join the sqlline-dev list
>>>> [1] and chime in.
>>>> 
>>>> Julian
>>>> 
>>>> [1] https://groups.google.com/g/sqlline-dev
>> 



Re: A new developer wiki begins!

2021-10-30 Thread luoc


It cannot get any better than this!

> 2021年10月30日 下午5:39,James Turton  写道:
> 
> I'm delighted to report that the gold mine of developer information that is 
> Paul Rogers' Drill wiki has just formed the basis of a new Drill developer 
> wiki.
> 
> https://github.com/apache/drill/wiki
> 
> The community would like to thank Paul for this sizeable and valuable 
> contribution, and for his blessing that we proceed to merge the work under 
> the normal Apache contributor terms.
> 
> Our work here is just beginning.  A wiki is never a completed work, but 
> requires ongoing editing from all of us to remain complete and accurate.  
> Let's go on to make it the powerful asset for future Drill developers that it 
> certainly can be.
> 
> James
> 



Re: A new developer wiki begins!

2021-10-31 Thread luoc


That is good advice. I recommend adding a page (or a table list) listing all 
the wiki contributors. Paul is the founding member.

> 在 2021年11月1日,01:45,Charles Givre  写道:
> 
> This is great!  Can we give @paul-rogers some credit on these pages?  Also 
> I'd really love to merge the existing dev docs in the github repo with the 
> wiki docs.  I'm willing to help with that, time permitting.
> -- C
> 
>> On Oct 30, 2021, at 5:59 AM, luoc  wrote:
>> 
>> 
>> It cannot get any better than this!
>> 
>>> 2021年10月30日 下午5:39,James Turton  写道:
>>> 
>>> I'm delighted to report that the gold mine of developer information that is 
>>> Paul Rogers' Drill wiki has just formed the basis of a new Drill developer 
>>> wiki.
>>> 
>>> https://github.com/apache/drill/wiki
>>> 
>>> The community would like to thank Paul for this sizeable and valuable 
>>> contribution, and for his blessing that we proceed to merge the work under 
>>> the normal Apache contributor terms.
>>> 
>>> Our work here is just beginning.  A wiki is never a completed work, but 
>>> requires ongoing editing from all of us to remain complete and accurate.  
>>> Let's go on to make it the powerful asset for future Drill developers that 
>>> it certainly can be.
>>> 
>>> James
>>> 
>> 


Drill 1.20 release plan

2021-11-01 Thread luoc

Hello, Drill dev and users :

Since the latest 1.19, Drill master branch has collected many changes, bug 
fixed and enhanced. Drill team plan to release the 1.20 at the end of November 
2021.

We have some things to work out :

1. Are you willing to be the 1.20 release manager?

2. Is there one of the unmerged pull request that you want to complete?

3. Do you have a feature under development and want to include in 1.20?

4. Would you like to help with the test and feedback (build with master branch)?

I hope everyone will participate in the talk and reply to these questions as 
soon as possible, thank you.

-- luoc




Re: A new developer wiki begins!

2021-11-02 Thread luoc


LGTM. +1

> 在 2021年11月2日,21:33,James Turton  写道:
> 
> |Hi Charles||
> ||
> ||When I first took this idea to Paul I proposed that we attribute authorship 
> but he declined that bit.  We do have the Git history for the wiki, and the 
> lines shown for the last Git commit to affect a page are quite visible in the 
> wiki, e.g.||
> ||
> ||> Paul Rogers edited this page on 27 Apr 2020.||
> ||
> ||But those will of course fade over time as others add commits.  I did not 
> argue the matter, just concluded with "if you ever change your mind, tell us 
> and we will add an attribution". To give you an idea of what Cong's table of 
> authors might look like if it was ranked by number of commits, here's the 
> output of git shortlog -sn.||
> ||
> ||   752  Paul Rogers||
> || 8  Mohamed Gelbana||
> || 1  Boaz Ben-Zvi||
> || 1  Dobes Vandermeer||
> || 1  Muhammad Gelbana||
> ||
> ||The size of Paul's contribution is humbling.||We can still add a page with 
> author names (with or without any edit stats) on it, I wouldn't expect Paul 
> to object.  He seemed mostly to be saying "it's not necessary for me".
> |||
> 
>> On 2021/11/01 02:03, luoc wrote:
>> That is good advice. I recommend adding a page (or a table list) listing all 
>> the wiki contributors. Paul is the founding member.
>> 
>>>> 在 2021年11月1日,01:45,Charles Givre  写道:
>>> 
>>> This is great!  Can we give @paul-rogers some credit on these pages?  Also 
>>> I'd really love to merge the existing dev docs in the github repo with the 
>>> wiki docs.  I'm willing to help with that, time permitting.
>>> -- C
>>> 
>>>> On Oct 30, 2021, at 5:59 AM, luoc  wrote:
>>>> 
>>>> 
>>>> It cannot get any better than this!
>>>> 
>>>>> 2021年10月30日 下午5:39,James Turton  写道:
>>>>> 
>>>>> I'm delighted to report that the gold mine of developer information that 
>>>>> is Paul Rogers' Drill wiki has just formed the basis of a new Drill 
>>>>> developer wiki.
>>>>> 
>>>>> https://github.com/apache/drill/wiki
>>>>> 
>>>>> The community would like to thank Paul for this sizeable and valuable 
>>>>> contribution, and for his blessing that we proceed to merge the work 
>>>>> under the normal Apache contributor terms.
>>>>> 
>>>>> Our work here is just beginning.  A wiki is never a completed work, but 
>>>>> requires ongoing editing from all of us to remain complete and accurate.  
>>>>> Let's go on to make it the powerful asset for future Drill developers 
>>>>> that it certainly can be.
>>>>> 
>>>>> James
>>>>> 



Re: Drill 1.20 release plan

2021-11-03 Thread luoc


Thanks for your support, James. Since there are no negative votes, I will 
recommend you as the release manager.

We'll keep a light on for you.

> 在 2021年11月3日,01:04,James Turton  写道:
> 
> Hi luoc
> 
> I am willing to help the release in any capacity needed.  I know there are 
> others who have release experience while I do not but I'm sure it can be 
> learned.  I'll have PR #2351 done this week, it would be nice (but not 
> critical) to include it.
> 
> Thanks
> James
> 
>> On 2021/11/01 16:27, luoc wrote:
>> 
>> Hello, Drill dev and users :
>> 
>> Since the latest 1.19, Drill master branch has collected many changes, bug 
>> fixed and enhanced. Drill team plan to release the 1.20 at the end of 
>> November 2021.
>> 
>> We have some things to work out :
>> 
>> 1. Are you willing to be the 1.20 release manager?
>> 
>> 2. Is there one of the unmerged pull request that you want to complete?
>> 
>> 3. Do you have a feature under development and want to include in 1.20?
>> 
>> 4. Would you like to help with the test and feedback (build with master 
>> branch)?
>> 
>> I hope everyone will participate in the talk and reply to these questions as 
>> soon as possible, thank you.
>> 
>> -- luoc
>> 
>> 



Re: Drill 1.20 release plan

2021-11-03 Thread luoc


Thank you, Charles. I can see you everywhere.
The most competent PMC Chair.

> 在 2021年11月3日,21:24,Charles Givre  写道:
> Hi Luoc, 
> IMHO there are a few PRs in flight that I’d like to see included in the next 
> release.  I sent them in slack, but so that they are preserved for the 
> mailing list.  I'd like to see DRILL-1282, DRILL-7938, DRILL-8027, DRILL-8028 
> and possibly DRILL-8009 and DRILL-7978 get merged for the next release.  
> DRILL-7871 would be a stretch goal.
> Best,
> — C
> 
>> On Nov 3, 2021, at 9:21 AM, luoc  wrote:
>> 
>> 
>> Thanks for your support, James. Since there are no negative votes, I will 
>> recommend you as the release manager.
>> 
>> We'll keep a light on for you.
>> 
>>>> 在 2021年11月3日,01:04,James Turton  写道:
>>> Hi luoc
>>> I am willing to help the release in any capacity needed.  I know there are 
>>> others who have release experience while I do not but I'm sure it can be 
>>> learned.  I'll have PR #2351 done this week, it would be nice (but not 
>>> critical) to include it.
>>> Thanks
>>> James
>>>> On 2021/11/01 16:27, luoc wrote:
>>>> 
>>>> Hello, Drill dev and users :
>>>> Since the latest 1.19, Drill master branch has collected many changes, bug 
>>>> fixed and enhanced. Drill team plan to release the 1.20 at the end of 
>>>> November 2021.
>>>> We have some things to work out :
>>>> 1. Are you willing to be the 1.20 release manager?
>>>> 2. Is there one of the unmerged pull request that you want to complete?
>>>> 3. Do you have a feature under development and want to include in 1.20?
>>>> 4. Would you like to help with the test and feedback (build with master 
>>>> branch)?
>>>> I hope everyone will participate in the talk and reply to these questions 
>>>> as soon as possible, thank you.
>>>> -- luoc



Re: Start embedded Drill on JDBC connection

2021-11-05 Thread luoc

Hi Maksym,
Thanks for the idea. In your opinion, there is no longer need to extract the 
binary tar.gz, and start the Drillbit? Only import the Drill JDBC dependence 
into your pom file? If so, what is the lifecycle of embedded Drill?

> 在 2021年11月5日,21:06,Rumar, Maksym  写道:
> 
> Hi all drill devs and users!
> 
> I have one thought about embedded Drill and would like to discuss it with you.
> Drill JDBC driver may start embedded Drill by self (if to make some 
> machinations with dependencies) and I think that it is a very useful feature. 
> With this opportunity, all people not familiar with Drill may try it in so 
> convenient and simple way: add dependencies in pom and just write a few code 
> lines of JDBC to make a test. This will leverage the level of entry to Drill 
> and may make Drill more famous.
> 
> This feature is actually not supported for a now and blocked with a simple 
> check
>  in Drill JDBC code. What do you think about it? What if we add support for 
> it and improve it by adding a convenient way for this case to set up storage 
> plugins?
> 
> Regards,
> Maksym


Re: Відп.: Start embedded Drill on JDBC connection

2021-11-12 Thread luoc
Hi Maksym,

There seem to be no objections to this practice. So, you can create a pull 
request and let's continue to talk about it in code.

> 2021年11月8日 下午8:50,Rumar, Maksym  写道:
> 
> "just can't wrap my head around what the pom file would need to look like"
> ​Yea, good remark. It would be great if it was enough to use only jdbc-all 
> jar, but it was created for another goal, so probably, it would be a bad 
> idea, to add all necessary jars for embedded Drill into jdbc-all jar, as it 
> should have only those jars that are needed for Drill JDBC driver.
> 
> Then, I see just a few ways that either has its cons and pros:
> 
>  1.   Create a module similar to jdbc-all, which will bundle all necessary 
> dependencies for embedded Drill in one module. Then a user will have to add 
> only 2 dependencies: jdbc, and "embedded-drill". This approach is pretty 
> simple but requires creating one more module in Drill project.
>  2.  Another way, is to create tutorial which will have all necessary 
> dependencies in its example pom, which users may copy-paste and use it for 
> their tests. But this way is not user friendly and actually ruins the main 
> idea - convenient and easy way to try Drill. But this approach has a chance 
> to live and belongs on how many jars are needed to add to the user's pom file.
> 
> 
> Від: James Turton 
> Надіслано: 6 листопада 2021 р. 7:19
> Кому: dev@drill.apache.org ; Rumar, Maksym 
> 
> Тема: Re: Відп.: Start embedded Drill on JDBC connection
> 
> I like the idea, just can't wrap my head around what the pom file would
> need to look like.
> 
> On 2021/11/05 23:34, Rumar, Maksym wrote:
>> Yes, right. Only import the Drill JDBC dependence into the pom file. 
>> Embedded Drill could start on the step of establishing connection and end on 
>> the connection close.
>> 
>> I understand that it is too expensive operation, but it is need for 
>> simplifying of start up. With it, we may get 3 ways of bootstrapping Drill: 
>> embedded, standalone embedded and distributed. And each one has it's use 
>> cases.
>> 
>> Від: luoc 
>> Надіслано: 5 листопада 2021 р. 15:45
>> Кому: dev@drill.apache.org 
>> Тема: Re: Start embedded Drill on JDBC connection
>> 
>> 
>> Hi Maksym,
>> Thanks for the idea. In your opinion, there is no longer need to extract the 
>> binary tar.gz, and start the Drillbit? Only import the Drill JDBC dependence 
>> into your pom file? If so, what is the lifecycle of embedded Drill?
>> 
>>> 在 2021年11月5日,21:06,Rumar, Maksym  写道:
>>> 
>>> Hi all drill devs and users!
>>> 
>>> I have one thought about embedded Drill and would like to discuss it with 
>>> you.
>>> Drill JDBC driver may start embedded Drill by self (if to make some 
>>> machinations with dependencies) and I think that it is a very useful 
>>> feature. With this opportunity, all people not familiar with Drill may try 
>>> it in so convenient and simple way: add dependencies in pom and just write 
>>> a few code lines of JDBC to make a test. This will leverage the level of 
>>> entry to Drill and may make Drill more famous.
>>> 
>>> This feature is actually not supported for a now and blocked with a simple 
>>> check<https://github.com/apache/drill/blob/4aefcef2b665c5737471664912a26ef6ed9a6cfc/exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillConnectionImpl.java#L109>
>>>  in Drill JDBC code. What do you think about it? What if we add support for 
>>> it and improve it by adding a convenient way for this case to set up 
>>> storage plugins?
>>> 
>>> Regards,
>>> Maksym
> 



Re: Drill Wiki Access

2021-11-24 Thread luoc

Hi Maksym,
  Thank you for the feedback. I'm worried about meeting the creators of spam, 
the risk is medium.

> 在 2021年11月24日,23:58,Rumar, Maksym  写道:
> Hi all,
> 
> I just found that I can add and edit any page on Drill Wiki. So, it means, 
> that anybody can add and remove anything he would like. What do you think 
> about it? Whether project should have open documentation for all?
> 
> As I know, GitHub doesn't support pull requests for the wiki repository, so 
> it's a question of how should look process of changing Drill wiki. What are 
> your thoughts about it?
> 
> Regards,
> Maksym



Re: Drill 1.20 release plan

2021-12-01 Thread luoc
Hi James,
  Is there a ticket related to Calcite?

> 在 2021年12月1日,21:14,James Turton  写道:
> We've picked up a bug: DRILL-8063.  It was present in earlier releases 
> (verified in 1.18) and may require a fix in Calcite rather than our own 
> codebase.  It's a severe one, resulting in an OOM crash for particular 
> queries.  Please indicate if you would like to have the following marked as a 
> release blocker.  Lazy consensus will apply, i.e. no reply = "I do not 
> believe this bug should block 1.20".
> 
> https://issues.apache.org/jira/browse/DRILL-8063
> 
> 
>> On 2021/12/01 12:00, James Turton wrote:
>> Hi all
>> 
>> Given we've had no objections, please strive to merge your PRs for 1.20 by 
>> 10 December which is the current targeted freeze date.
>> 
>> Closed:
>> DRILL-1282 Parquet v2 read+write
>> DRILL-8027 Iceberg format
>> DRILL-8009 JDBC isValid()
>> 
>> Open:
>> DRILL-7863 Phoenix storage
>> DRILL-7978 Fixed width format
>> DRILL-7983 Get running/completed profiles from REST API
>> DRILL-8015 MongoDB Metastore
>> DRILL-8028 PDF format
>> * DRILL-8057 INFORMATION_SCHEMA filter push down is inefficient (feasibility 
>> not yet clear)
>> 
>> 
>>> On 2021/11/25 09:53, James Turton wrote:
>>> Dear dev community
>>> Please see an update on the Jiras earmarked for 1.20 below. We have of 
>>> course also closed other Jiras in the intervening period. If you are aware 
>>> of any reason that one of the listed Jiras will not be ready please say so, 
>>> so I can remove it. Otherwise I'll post comments to the authors asking them 
>>> to aim for merging by the release cut-off date.  How does a cut-off date of 
>>> 10 December sound?
>>> (* indicates a Jira not previously discussed in this thread.)
>>> Closed:
>>> DRILL-1282 Parquet v2 read+write
>>> DRILL-8027 Iceberg format
>>> Open:
>>> DRILL-7863 Phoenix storage
>>> DRILL-7978 Fixed width format
>>> DRILL-7983 Get running/completed profiles from REST API (corrected from 
>>> 7938 which I believe was a typo)
>>> DRILL-8009 JDBC isValid()
>>> * DRILL-8015 MongoDB Metastore
>>> DRILL-8028 PDF format
>>> The stretch goal DRILL-7871 (StoragePluginStore instance per user) has not 
>>> yet reached design consensus so I propose that it should not be included in 
>>> 1.20.
>>> On 2021/11/03 15:24, Charles Givre wrote:
>>>> Hi Luoc,
>>>> IMHO there are a few PRs in flight that I’d like to see included in the 
>>>> next release.  I sent them in slack, but so that they are preserved for 
>>>> the mailing list.  I'd like to see DRILL-1282, DRILL-7938, DRILL-8027, 
>>>> DRILL-8028 and possibly DRILL-8009 and DRILL-7978 get merged for the next 
>>>> release. DRILL-7871 would be a stretch goal.
>>>> Best,
>>>> — C
>>>>> On Nov 3, 2021, at 9:21 AM, luoc  wrote:
>>>>> Thanks for your support, James. Since there are no negative votes, I will 
>>>>> recommend you as the release manager.
>>>>> We'll keep a light on for you.
>>>>>> 在 2021年11月3日,01:04,James Turton  写道:
>>>>>> Hi luoc
>>>>>> I am willing to help the release in any capacity needed. I know there 
>>>>>> are others who have release experience while I do not but I'm sure it 
>>>>>> can be learned.  I'll have PR #2351 done this week, it would be nice 
>>>>>> (but not critical) to include it.
>>>>>> Thanks
>>>>>> James
>>>>>>> On 2021/11/01 16:27, luoc wrote:
>>>>>>> 
>>>>>>> Hello, Drill dev and users :
>>>>>>> Since the latest 1.19, Drill master branch has collected many changes, 
>>>>>>> bug fixed and enhanced. Drill team plan to release the 1.20 at the end 
>>>>>>> of November 2021.
>>>>>>> We have some things to work out :
>>>>>>> 1. Are you willing to be the 1.20 release manager?
>>>>>>> 2. Is there one of the unmerged pull request that you want to complete?
>>>>>>> 3. Do you have a feature under development and want to include in 1.20?
>>>>>>> 4. Would you like to help with the test and feedback (build with master 
>>>>>>> branch)?
>>>>>>> I hope everyone will participate in the talk and reply to these 
>>>>>>> questions as soon as possible, thank you.
>>>>>>> -- luoc
> 
> 



Re: Drill Meetup - thank you!

2021-12-05 Thread luoc


Tengfei, Welcome!
We'll see you next time.

> 在 2021年12月5日,13:58,王腾飞(飞腾)  写道:
> 
> Really regret to have missed this event, will join next time. Do we have 
> meeting minutes like before 
> https://docs.google.com/document/d/1o2GvZUtJvKzN013JdM715ZBzhseT0VyZ9WgmLMeeUUk/edit#heading=h.z8q6drmaybbj
> --
> 发件人:James Turton 
> 发送时间:2021年12月4日(星期六) 12:16
> 收件人:u...@drill.apache.org ; dev 
> 主 题:Drill Meetup - thank you!
> 
> Thank you to everyone who attended and made this exciting event happen.
> 
> I'd like to make special mention of Cong Luo (luoc) and Jingchuan Hu 
> (kingswan) who honoured the community by staying up to join this first 
> call quite literally in the middle of the night.  I will definitely 
> learn how to the record calls for you and, if anyone has any clever 
> global time zone scheduling tricks, please propose them at any time.
> 
> Until next time!
> James



Re: [LAZY VOTE] Drill 1.20 freeze delay

2021-12-08 Thread luoc


I accept that.
And I see that DRILL-8015 and DRILL-7983 have made good progress in the code 
review, this cut date(12.16) should be the last time.

> 2021年12月8日 下午11:51,Charles Givre  写道:
> 
> I think it's worth extending a week. I'd like to see DRILL-8073, 8069 and 
> 8067 added to the list as they seem fairly important. 
> -- C
> 
> 
>> On Dec 8, 2021, at 10:40 AM, James Turton  wrote:
>> 
>> Dear dev community
>> 
>> Please reply if you *object* to us pushing out the freeze date by one week 
>> to 2021-12-16.  The motivation to delay is to try to include more of the 
>> open PRs that we are tracking below, a number of which are essentially 
>> dev-complete but not yet over the line.
>> 
>> Closed
>> 
>> DRILL-1282 Parquet v2 read+write 
>> DRILL-7863 Phoenix storage
>> DRILL-8027 Iceberg format 
>> DRILL-8009 JDBC isValid() 
>> 
>> Open
>> 
>> DRILL-7978 Fixed width format 
>> DRILL-7983 Get running/completed profiles from REST API 
>> DRILL-8015 MongoDB Metastore 
>> DRILL-8028 PDF format 
>> DRILL-8057 INFORMATION_SCHEMA filter push down is inefficient (feasibility 
>> not yet clear)
>> 
>> Thank you
>> James
>> 
>> 
> 



Re: New Scam/Spam ticket

2021-12-24 Thread luoc


Hi Maksym,
  This ticket has been closed. Thanks for the feedback.

> On Dec 24, 2021, at 18:45, Rumar, Maksym  wrote:
> 
> Hi, everyone!
> 
> I noticed, that someone has created a scam/spam ticket in the Drill Jira. Who 
> does have access, could you please remove it?
> 
> https://issues.apache.org/jira/browse/DRILL-8091
> 
> 
> 


Re: New Scam/Spam ticket

2022-01-01 Thread luoc

INFRA-22687 <https://issues.apache.org/jira/projects/INFRA/issues/INFRA-22687> 
Done!

All issues deleted, all users blocked.

> 2021年12月27日 下午4:21,James Turton  写道:
> 
> Thank you both.  Please report these to Apache Infra so that they can lock 
> out the abusive user account in addition to deleting the spam ticket.  
> Closing the ticket is good but then it still contaminates the project's Jira 
> statistics in a minor way and the account may create more.
> 
> On 2021/12/24 13:45, luoc wrote:
>> Hi Maksym,
>>   This ticket has been closed. Thanks for the feedback.
>> 
>>> On Dec 24, 2021, at 18:45, Rumar, Maksym  wrote:
>>> 
>>> Hi, everyone!
>>> 
>>> I noticed, that someone has created a scam/spam ticket in the Drill Jira. 
>>> Who does have access, could you please remove it?
>>> 
>>> https://issues.apache.org/jira/browse/DRILL-8091
>>> 
>>> <https://issues.apache.org/jira/browse/DRILL-8091>
>>> 



Re: [LAZY VOTE] Delete branches gh-pages and gh-pages-master from apache/drill

2022-01-03 Thread luoc


James, could you please confirm that there is no link to `gh-pages` directly in 
the current document?

> On Jan 3, 2022, at 16:28, James Turton  wrote:
> 
> It's been about four months since we moved the Drill website source over to 
> apache/drill-site.  Things have been working fine and we took the full commit 
> history across when we migrated so I propose to delete this cruft from 
> apache/drill.
> 
> Please reply if you object.
> 
> Thanks
> James


Re: Happy new year!

2022-01-03 Thread luoc
Happy New Year 2022!

For the second meetup, I’m going to initiate a quick discussion: Speed Up 
Release
1. Community status
2. Release frequency
3. Contributor Development

> 2022年1月3日 下午3:31,James Turton  写道:
> 
> Hi everyone
> 
> Happy new year to one and all, and here's to all the exciting developments 
> coming our way.
> 
> Firstly: Drill 1.20 has not been forgotten.  We have been holding off while 
> debugging some final issues in DRILL-8061, but the freeze is imminent.
> 
> We've got another community meetup this Friday.  Some folks may of course 
> still be on holiday but at the very least you'll find me on the other of the 
> line if you dial in.  Please reply here if there are any topics you'd like to 
> have added to the agenda...
> 
> Regards
> James



Re: Happy new year!

2022-01-05 Thread luoc


Hello James,
  Could you please send an email about the meetup time? And forwarding the 
messages to Slack channel, because I found that a lot of people didn't know 
about the last meetup. Thank you.

> 2022年1月3日 下午6:14,luoc  写道:
> 
> Happy New Year 2022!
> 
> For the second meetup, I’m going to initiate a quick discussion: Speed Up 
> Release
> 1. Community status
> 2. Release frequency
> 3. Contributor Development
> 
>> 2022年1月3日 下午3:31,James Turton  写道:
>> 
>> Hi everyone
>> 
>> Happy new year to one and all, and here's to all the exciting developments 
>> coming our way.
>> 
>> Firstly: Drill 1.20 has not been forgotten.  We have been holding off while 
>> debugging some final issues in DRILL-8061, but the freeze is imminent.
>> 
>> We've got another community meetup this Friday.  Some folks may of course 
>> still be on holiday but at the very least you'll find me on the other of the 
>> line if you dial in.  Please reply here if there are any topics you'd like 
>> to have added to the agenda...
>> 
>> Regards
>> James



Re: Next community meetup

2022-01-05 Thread luoc
Hi all,

I invited a user to do a demo (about 20 minutes), please put it in fourth 
position, thank you.

What about this?
GoodData.CN  analytics platform queries Drill managing 
Minio, Postgres, Vertica, Kafka and HTTP API data sources, in real-time
Generated data model freely inspired by the book Zero from Mark Elsberg
Present list of missing SQL features in Drill, which we consider as mandatory 
for analytics

Jan Soubusta, a user active in Slack, work at company GoodData.CN, and they 
have built Drill as an engine into their cloud platform.

> 2022年1月5日 下午6:18,James Turton  写道:
> 
> Just a quick reminder of this week's meetup which occurs at 8am PDT on Friday.
> 
> https://zoom.us/j/84291153325?pwd=NWZ1MS9lbkVmMHZXWmtXN2loQ0ZZZz09
> 
> (Meeting ID 84291153325, passcode 488311)
> 
> The same details are also always available on the Community Resources web page
> 
> https://drill.apache.org/community-resources/
> 
> The agenda, which is not rigid, already contains
> 
> 1. Community status
> 2. Release frequency
> 3. Contributor Development
> 4. ValueVectors and possible replacements for Drill 2.0
> 
> James



Re: [DISCUSS] Lombok - friend or foe?

2022-01-22 Thread luoc

Hi all,

I have a story here. In Oct 2021, I upgraded Eclipse to the latest release 
(2021–09) and then found out that the Lombok dependency was added Drill 
repository, So I installed Lombok (as a new plugin) from Eclipse Marketplace as 
I used to. Finally, restarted the IDE and prepared to open the Drill project, 
but it is crushed cause by the issue #2956 
, Lombok was not available 
until I looked at a temporary solution..

I use both Eclipse and IDEA, but I use Eclipse more often. I have no objection 
to the use of Lombok, but suggest the following three points :

1. Could we use Lombok only in `drill-contrib` module?

2. Could we agree not to use Lombok in common module?

3. It is best to update the dev documentation to describe this results if we 
continue to use Lombok.

In fact, I have the same idea as Paul, more about balancing choices.

Thanks.

> 2022年1月22日 下午5:34,Paul Rogers  写道:
> 
> Hi All,
> 
> I look at any tool as a cost/benefit tradeoff. If Drill were a typical
> business app, with lots of "data objects", then the hassle of Lomboc might
> be a net win. However, the nature of Drill is that we have very few data
> objects. We have lots of Protobuf objects, or Jackson-serialized objects,
> but not too many data objects of the kind used with object-relational
> mappers.
> 
> On the other hand, I had to spend an hour or so trying to figure out why
> things would not build in Eclipse. Then, more time to figure out how to
> install the half-finished Lomboc plugin for Eclipse and various other
> fiddling.
> 
> So, I'd guess, on balance, Lombok has cost, and will continue to cost, more
> time than it saved avoiding a few getter/setter methods. And, I agree with
> Ted, Eclipse (and, I assume IntelliJ), is pretty quick at generating those
> methods.
> 
> Since Lomboc has a cost, and is not a huge win, KISS suggests we avoid
> adding extra dependencies unnecessarily.
> 
> That's my 2 cents...
> 
> - Paul
> 
> 
> 
> On Fri, Jan 21, 2022 at 8:51 AM Ted Dunning  wrote:
> 
>> A couple of years ago, I had a dev introduce Lombok into some code without
>> me knowing. That let me be a classic naive user.
>> 
>> The result was total confusion on my part. Sooo much code was being
>> automagically generated that I couldn't figure out the code and spent a lot
>> of time chasing my tail and very little time looking at the crux of the
>> code.
>> 
>> My own personal preference is either
>> 
>> - use a language like Julia if you want magic. It's fantastic and all to
>> have amazing stuff and coders expect to see it.
>> 
>> - use an IDE to generate the boiler plate and put it into its own little
>> annex in the code with the interesting bits near the top of classes. That
>> lets debuggers and IDEs that don't understand Lombok to function without
>> impairing readability much. Concurrent with that, use discipline to not do
>> strange things like changing the expected meaning of the boilerplate.
>> 
>> That's my preference, but I wouldn't want to push that preference very
>> hard. My own prioritization is on readability of the code by outsiders.
>> 
>> 
>> 
>> 
>> On Fri, Jan 21, 2022 at 2:25 AM James Turton  wrote:
>> 
>>> Hi again Devs
>>> 
>>> This one is simple to describe.  Lombok entered the Drill code base this
>>> year, but not everyone feels that Lombok is appropriate for every code
>>> base.  To my, fairly limited, understanding the advantage of Lombok is
>>> that boilerplate code is reduced while the disadvantage is the
>>> deployment of code generation magic that can have untoward effects on
>>> build-time tools and IDEs.
>>> 
>>> So here is a chance to opine on Lombok if you'd like to.  My own opinion
>>> is very near neutral and goes something like "It burned me a bit once,
>>> but hasn't since, and less boilerplate is nice.  I guess it can stay
>>> .  I hope I don't regret this one day."
>>> 
>>> Regards
>>> James
>>> 
>> 



Re: [DISCUSS] Lombok - friend or foe?

2022-01-24 Thread luoc
I recommend creating this small task on GitHub Issues or JIRA, adding the 
"newcomer" tag to provide a good chance (to contribute) for newcomers.

Supporting new developers is the best thing for the Drill community.

Thanks.

> 2022年1月24日 下午5:01,James Turton  写道:
> 
> Okay, let's approach it that way around: remove it entirely, and Lombok can 
> make a return to plugins *after* they start being built and tested away from 
> the main tree, if any plugin authors want it.
> 
> P.S. Plugins did indeed not annotate any data objects.  Lombok's use there, 
> in what I've seen, has been for the automatic generation of stuff like 
> constructors, getters, setters, loggers, toStrings and hashCodes.  That's 
> just for interest's sake, not an effort to remotivate Lombok's inclusion.
> 
> On 2022/01/24 10:16, Paul Rogers wrote:
>> A quick check of the source suggests that the Easy Format config builder
>> (which is a nice addition) does not use Lomboc. Someone coded up (or had
>> their IDE code up) the setters one-by-one. Makes sense, Lombok isn't for
>> the builder pattern.
>> 
>> Note that, allowing Lomboc in any part of Drill is the same as allowing it
>> everywhere. The old CS thing that the only numbers that matter are 0, 1 and
>> infinity. To do a PR, all tests should pass, which means that the IDE needs
>> to be able to debug any that have problems. If any plugin uses Lomboc, then
>> developers have to wrestle with it. (But, what is a plugin doing with data
>> objects?)
>> 
>> So, perhaps remove it entirely for now. It can be added back for extensions
>> when those extensions are separate projects. (Though, adding that
>> dependency on one extension adds it for everyone. Will there be Lomboc
>> version conflicts? Should we wait for class loader isolation before
>> allowing it back?)
>> 
>> In general, Drill is so large that it should not take on more dependencies
>> unless they are a huge win. This is a reason to move the obscure plugins
>> out of the core: mucking with distributed systems should also require one
>> to muck with Excel.
>> 
>> - Paul
>> 
>> On Sun, Jan 23, 2022 at 11:59 PM James Turton  wrote:
>> 
>>> I'll prepare a PR that unlomboks everything except contrib.  Since we're
>>> talking about contrib splitting off into one or many independent code
>>> bases (c.f. install "Drill 2 and plug-in organisation"), working to make
>>> it conform to coding standards that we're selecting for core Drill
>>> probably won't pay.
>>> 
>>> On 2022/01/23 01:36, Charles Givre wrote:
>>>> I guess the question is do we de-lombok what has already been done?  I
>>> really like the builders for plugin configs, but I'm generally in agreement
>>> that if it is causing problems building, we should ditch it.
>>>> Best,
>>>> -- C
>>>> 
>>>> 
>>>> 
>>>>> On Jan 22, 2022, at 5:02 PM, Ted Dunning  wrote:
>>>>> 
>>>>> The Lombok story is better in Intellij, possibly because the Lombok devs
>>>>> use IntelliJ like the majority of devs. Once I knew to install the
>>> plugin,
>>>>> things were at least comprehensible.
>>>>> 
>>>>> But the problem is that it isn't obvious. As a newcomer, you don't know
>>>>> what you don't know and because Lombok's major effect is code that isn't
>>>>> there, it isn't obvious where to look.
>>>>> 
>>>>> The point about it not helping that much due to Drill's design (good
>>> point,
>>>>> paul) is apposite, but I think the naive reader issue is even bigger.
>>>>> 
>>>>> Net, as a person who isn't developing anything for Drill just lately, I
>>>>> don't think it's a good idea at all.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sat, Jan 22, 2022 at 6:37 AM luoc  wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> I have a story here. In Oct 2021, I upgraded Eclipse to the latest
>>> release
>>>>>> (2021–09) and then found out that the Lombok dependency was added Drill
>>>>>> repository, So I installed Lombok (as a new plugin) from Eclipse
>>>>>> Marketplace as I used to. Finally, restarted the IDE and prepared to
>>> open
>>>>>> the Drill project, but it is crushed cause by the issue #2956 

Re: [ANNOUNCE] James Turton as PMC Member

2022-01-26 Thread luoc


I'll cheer you on.

> On Jan 25, 2022, at 15:49, Paul Rogers  wrote:
> 
> Congratulations James!
> 
> - Paul
> 
>> On Mon, Jan 24, 2022 at 9:34 AM Charles Givre 
>> wrote:
>> 
>> The Project Management Committee (PMC) for Apache Drill is pleased to
>> announce that we have invited James Turton to join us as a PMC member of
>> the Drill project and he has accepted.  Please join me in congratulating
>> James and welcoming him to the PMC!
>> 
>> 
>> Best,
>> Charles Givre
>> PMC Chair, Apache Drill
>> 
>> 
>> 
>> 
>> Charles Givre
>> Founder, CEO DataDistillr
>> Email:  char...@datadistillr.com
>> Phone:  + 443-762-3286
>> Book a Meeting 30 min  • 60
>> min 
>> LinkedIn @cgivre 
>> GitHub @cgivre 
>> 
>> 



Re: [ANNOUNCE] New Committer: PJ Fanning

2022-01-26 Thread luoc


Well done.
Thank you for doing this.

> On Jan 25, 2022, at 16:05, Paul Rogers  wrote:
> 
> Congratulations!
> 
> - Paul
> 
>> On Mon, Jan 24, 2022 at 9:15 AM Charles Givre  wrote:
>> 
>> The Project Management Committee (PMC) for Apache Drill is pleased to
>> announce that we have invited PJ Fanning to join us as a committer to the
>> Drill project.  PJ is a committer and PMC member for the Apache POI project
>> and author of the Excel Streaming library which Drill uses for the Excel
>> reader.  He has contributed numerous fixes and assistance to Drill relating
>> to the Drill's Excel reader.  Please join me in congratulating PJ and
>> welcoming him as a committer!
>> 
>> Best,
>> Charles Givre
>> PMC Chair, Apache Drill
>> 
>> 



Re: January 2021 community meetup notes

2022-01-26 Thread luoc


A new speech!

Kashifuddin is an DBA engineer, working at KCloud. He plans to provide a speech 
on how to automate the deployment of the Drill cluster on the Google Cloud 
platform.

* take about 20 minutes to complete.

> 2022年1月27日 下午1:31,James Turton  写道:
> 
> Not too *soon for you to send
> 
> On 2022/01/26 20:30, James Turton wrote:
>> Hi all
>> 
>> Sorry these notes were stuck with me for a while!  The next meetup is next 
>> Friday, not too for you to send anything you would like incorporated.
>> 



Re: using JDBC to connect to Drill

2022-02-03 Thread luoc

Hi Jorge,
It seems that we have answered this question before. Let me find it first..

https://github.com/apache/drill/issues/2415

> On Feb 3, 2022, at 17:28, Jorge Alvarado  wrote:
> 
> Hi Drill community,
> 
> I'm trying to connect to drill 1.19 using JDBC,
> 
> For context: I have a VM running zookeeper and another VM running drillbit.
> The web UI is working fine, the queries are working fine.
> 
> In my maven dependency I have:
> org.apache.drill.exec
>drill-jdbc-all
>1.19.0
> 
> In my code:
> 
>  Connection conn = null;
>String url = "jdbc:drill:zk= cloud>:2181;schema=common1";
>String query = "SELECT * FROM `common1`.`products.json` LIMIT 10";
>Statement stmt = null;
>ResultSet result = null;
> 
>conn = DriverManager.getConnection(url);
>stmt = conn.createStatement();
>result = null;
>String column1,column2;
>result = stmt.executeQuery(query);
> 
> 
> When I run the console app I have  a bunch of errors but the most prominent 
> is:
> CONNECTION : java.net.UnknownHostException: drill3.internal.cloudapp.net: 
> Name or service not known
> 
> drill3.internal.cloudapp.net is exactly the name that appears on the drill 
> web UI for my only drillbit node.
> It makes sense that it cannot resolve as it looks like an internal address so 
> what I updated my hosts file (on my dev pc) to resolve the public ip address 
> of the drill node, but it still gives me the same error.
> 
> Do you have any ideas how to make my Java app to resolve the internal address?
> 
> thanks in advance
> 
> Jorge
> 
> 


Re: using JDBC to connect to Drill

2022-02-03 Thread luoc

Hi Christian,
  Is it possible to add this tips to the docs? It is recommended that we add it 
to the "Troubleshooting" section, thanks!

https://drill.apache.org/docs/troubleshooting/

> On Feb 3, 2022, at 18:46, Z0ltrix  wrote:
> 
> Christian


Re: February meetup recording

2022-02-05 Thread luoc


Congratulations!

> On Feb 5, 2022, at 15:06, James Turton  wrote:
> 
> Apologies, that link appears to be to a a recording of a short pre-meetup 
> test call by a community member.  Here's the recording of the meetup proper.
> 
> Topic: Apache Drill Community Meetup
> Date: Feb 4, 2022 07:58 AM Pacific Time (US and Canada)
> 
> Meeting Recording:
> https://datadistillr.zoom.us/rec/share/tFaDw99UfwK_ZxSg60AVM_PHCPdyIZci8MaRkVYQyAHOLiUmATJ4n4uxrkbW0Cm_.mNUcipDuNEcz_u3K
> 
> 
>> On 2022/02/05 09:02, James Turton wrote:
>> Thank you to everyone who joined!  The link to the recording is below and I 
>> understand that a textual summary will follow in due course.
>> 
>> Topic: Apache Drill Community Meetup
>> Date: Feb 4, 2022 07:56 AM Pacific Time (US and Canada)
>> 
>> Meeting Recording:
>> https://datadistillr.zoom.us/rec/share/JcAtM9xZkd4G6ZLkU9r_HHMJASE2efauu8jzSqdOE0_fIcNezvrqF7eGrDbLnGRH.K3SqKeRI2IYsZj2Q
>>  
>> 
>> James


Re: [VOTE] Release Apache Drill 1.20.0 - RC0

2022-02-05 Thread luoc


+1

1. Download and extract the tarballs, start and gracefully stop (based on 
distributed) Drillbit, submit a built-in sample query, and check the UI and 
profile.

2. Check the log output for start, query, and stop.

3. Add several actual mysql queries, phoenix queries, and federated queries.

select a.n_name, b.r_name from phoenix123.test.nation a join 
mysql57.test.region b on a.n_regionkey = b.r_regionkey where b.r_name = 'VALUE';

> 2022年2月5日 下午5:31,James Turton  写道:
> 
> P.S. if you know a reason that we should _not_ add 
> apache-drill-1.20.0-hadoop2.tar.gz.* artifacts to our release from the new 
> hadoop-2 profile please call a vote.
> 
> On 2022/02/05 11:11, James Turton wrote:
>> Hi all
>> 
>>  Note from the release manager.
>> 
>> The normal RC announcement follows below but please take note that while you 
>> should test and try this Hadoop 3-based RC 0 of Drill 1.20.0, there is 
>> likely to be another RC which ships both Hadoop 2 and Hadoop 3 builds as 
>> soon as I have got some advice on the best was to incorporate this in our 
>> release process.  However, that RC will be based on exactly the same commit 
>> as this one is (assuming no issues are found), so please do test this one 
>> every bit as much as you would have.
>> 
>> - Thank, James
>> 
>> I'd like to propose the first release candidate (RC0) of Apache Drill, 
>> version 1.20.0.
>> 
>> The release candidate covers a total of 105 resolved JIRAs [1]. Thanks to 
>> everyone who contributed to this release.
>> 
>> The tarball artifacts are hosted at [2] and the maven artifacts are hosted 
>> at [3].
>> 
>> This release candidate is based on commit 
>> 556b972560911c20691d5b5de6c656d22c59ce0b located at [4].
>> 
>> Please download and try out the release.
>> 
>> The vote ends at 2022-02-08 10:00 UTC ≅ 3×24 hours after the timestamp on 
>> this email.
>> 
>> [ ] +1
>> [ ] +0
>> [ ] -1
>> 
>> [1] 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350301&projectId=12313820
>> [2] https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0/
>> [3] https://repository.apache.org/content/repositories/orgapachedrill-1087/
>> [4] https://github.com/jnturton/drill/commits/drill-1.20.0



Re: [DISCUSS] Some ideas for Drill 1.21

2022-02-06 Thread luoc


Before we discuss the next release, I would like to explain that Apache project 
should not be directly linked to a commercial company, otherwise this will 
affect the motivation of the community to contribute.

Thanks.

> On Feb 6, 2022, at 21:29, Charles Givre  wrote:
> 
> Hello all, 
> Firstly, I wanted to thank everyone for all the work that has gone into Drill 
> 1.20 as well as the ongoing discussion around Drill 2.0.   I wanted to start 
> a discussion around topic for Drill 1.21 and that is INFO_SCHEMA 
> improvements.  As my company wades further and further into Drill, it has 
> become apparent that the INFO_SCHEMA could use some attention.  James Turton 
> submitted a PR which was merged into Drill 1.20, but in so doing he uncovered 
> an entire Pandora's box of other issues which might be worth addressing.  In 
> a nutshell, the issues with the INFO_SCHEMA are all performance related: it 
> can be very slow and also can consume significant resources when executing 
> even basic queries.  
> 
> My understanding of how the info schema (IS) works is that when a user 
> executes a query, Drill will attempt to instantiate every enabled storage 
> plugin to discover schemata and other information. As you might imagine, this 
> can be costly. 
> 
> So, (and again, this is only meant as a conversation starter), I was thinking 
> there are some general ideas as to how we might improve the IS:
> 1.  Implement a limit pushdown:  As far as I can tell, there is no limit 
> pushdown in the IS and this could be a relatively quick win for improving IS 
> query performance.
> 2.  Caching:  I understand that caching is tricky, but perhaps we could add 
> some sort of schema caching for IS queries, or make better use of the Drill 
> metastore to reduce the number of connections during IS queries.  Perhaps in 
> combination with the metastore, we could implement some sort of "metastore 
> first" plan, whereby Drill first hits the metastore for query results and if 
> the limit is reached, we're done.  If not, query the storage plugins...
> 3.  Parallelization:  It did not appear to me that Drill parallelizes IS 
> queries.   We may be able to add some parallelization which would improve 
> overall speed, but not necessarily reduce overall compute cost
> 4.  Convert to EVF2:  Not sure that there's a performance benefit here, but 
> at least we could get rid of cruft
> 5.  Reduce SeDe:   I imagine there was a good reason for doing this, but the 
> IS seems to obtain a POJO from the storage plugin then write these results to 
> old-school Drill vectors.  I'm sure there was a reason it was done this way, 
> (or maybe not) but I have to wonder if there is a more efficient way of 
> obtaining the information from the storage plugin, ideally w/o all the object 
> creation. 
> 
> These are just some thoughts, and I'm curious as to what the community thinks 
> about this.  Thanks everyone!
> -- C



Re: [VOTE] Release Apache Drill 1.20.0 - RC4

2022-02-18 Thread luoc


Hi Anton,
  Thanks for the feedback. It seems that I solved a similar problem in 1.20 and 
I will confirm it at night.

> On Feb 19, 2022, at 00:42, Anton Gozhiy  wrote:
> 
> Found a regression, please take a look:
> https://issues.apache.org/jira/browse/DRILL-8143
> The case is from Drill Test Framework and it is not reproducible with Drill
> 1.19.0.
> I'm not sure if the issue is significant enough to be a release blocker,
> but I vote -1 for now.
> 
>> On Fri, Feb 18, 2022 at 4:56 PM Charles Givre  wrote:
>> 
>> +1 for release.
>> 
>> Great work everyone!!
>> -- C
>> 
>> 
 On Feb 18, 2022, at 8:40 AM, Z0ltrix  wrote:
>>> 
>>> +1 for release.
>>> 
>>> - Installed Hadoop2 RC4 in our developement-environment on aws ec2
>> (ubuntu 18.04)
>>>  - zookeeper 3.6.5,
>>> 
>>>  - hadoop 2.9.2
>>> 
>>>  - hbase 1.5.0
>>>  - phoenix 4.15.0
>>> 
>>>  - phoenix-queryserver 1.0.0
>>>  - everything secured by kerberos
>>>  - everything tls encrypted
>>>  - everything impersonated
>>> - Run Queries agains Parquet Files stored in HDFS (impersonated) + INT96
>> Timestamps
>>> - Run Queries against HBase (impersonated)
>>> - Run Queries against Phoenix (impersonated)
>>> - Run UNION ALL Querie agains HBase + HDFS (Parquet) to simulate Lambda
>> Dataset
>>> - Run UNION ALL Querie agains Phoneix + HDFS (Parquet) to simulate
>> Lambda Dataset
>>> - Run ANALYZE TABLE COMPUTE STATISTICS on HDFS Parquet Talbes (Iceberg
>> Metastore)
>>> - Run ANALYZE TABLE REFRESH METADATA on HDFS Parquet Talbes (Iceberg
>> Metastore)
>>> - Run Queries against the iceberg metastore to simulate icequerg format
>> plugin reads
>>> - Tested some Superset and Tableau Dashboards over ODBC Connection
>> (impersonated)
>>> - Tested some Queries from Nifi over JDBC Connection
>>> 
>>> Regards
>>> 
>>> 
>>> Christian
>>> 
>>> --- Original Message ---
>>> 
>>> James Turton  schrieb am Donnerstag, 17. Februar 2022
>> um 19:53:
>>> 
 Hi all
 
>>> 
 I'd like to propose the fifth release candidate (RC4) of Apache Drill,
 
>>> 
 version 1.20.0 which differs from the previous RC in the following.
 
>>> 
 DRILL-8139: Parquet CodecFactory thread safety bug (#2463)
 
>>> 
 DRILL-8134: Cannot query Parquet INT96 columns as timestamps (#2460)
 
>>> 
 DRILL-8122: Change kafka metadata obtaining due to KAFKA-5697 (#2456)
 
>>> 
 DRILL-8137: Prevent reading union inputs after cancellation request
>> (#2462)
 
>>> 
 The release candidate covers a total of 117 resolved JIRAs [1]. Thanks
 
>>> 
 to everyone who contributed to this release.
 
>>> 
 The tarball artifacts are hosted at [2][3] and the maven artifacts are
 
>>> 
 hosted at [4][5].
 
>>> 
 This release candidate is based on commits
 
>>> 
 753bff39d8dd08eaa1273eadc20175d34a87e044 and
 
>>> 
 9955d082bcdba401666799f49a6cd3c3f996af97 located at [6][7].
 
>>> 
 Please download and try out the release.
 
>>> 
 [ ] +1
 
>>> 
 [ ] +0
 
>>> 
 [ ] -1
 
>>> 
 [1]
 
>>> 
 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350301&projectId=12313820
 
>>> 
 [2] https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0-rc4/
 
>>> 
 [3]
 
>>> 
 https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0-hadoop2-rc4/
 
>>> 
 (Hadoop 2 build)
 
>>> 
 [4]
>> https://repository.apache.org/content/repositories/orgapachedrill-1094/
 
>>> 
 [5]
 
>>> 
 https://repository.apache.org/content/repositories/orgapachedrill-1095/
 
>>> 
 (Hadoop 2 build)
 
>>> 
 [6] https://github.com/jnturton/drill/commits/drill-1.20.0
 
>>> 
 [7] https://github.com/jnturton/drill/commits/drill-1.20.0-hadoop2
 
>>> 
 (Hadoop 2 build)
>>> 
>> 
>> 
> 
> -- 
> Sincerely, Anton Gozhiy
> anton5...@gmail.com
> 


Re: Superset Drill Time Range Filter

2022-02-23 Thread luoc

Which Drill version are you running?

> On Feb 23, 2022, at 17:57, Z0ltrix  wrote:
> 
> 
> Hi drill devs,
> 
> we have a problem with our superset -> drill connection with time range 
> filters, as described below.
> 
> Superset sends the following to drill:
> WHERE `startTime` >= '2022-02-14 00:00:00.00'
>   AND `startTime` < '2022-02-21 00:00:00.00'
> ORDER BY `startTime` DESC
> 
> and i get the following error:
> 
> SYSTEM ERROR: ClassCastException: 
> org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast to 
> org.apache.drill.exec.expr.holders.TimeStampHolder
> 
> 
> Please, refer to logs for more information.
> 
> 
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: 
> org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast to 
> org.apache.drill.exec.expr.holders.TimeStampHolder
> org.apache.drill.exec.work.foreman.Foreman.run():305
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
>   Caused By (java.lang.ClassCastException) 
> org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast to 
> org.apache.drill.exec.expr.holders.TimeStampHolder
> org.apache.drill.exec.expr.FilterBuilder.getValueExpressionFromConst():208
> 
> org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():240
> 
> org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():58
> org.apache.drill.common.expression.FunctionHolderExpression.accept():53
> org.apache.drill.exec.expr.FilterBuilder.generateNewExpressions():268
> org.apache.drill.exec.expr.FilterBuilder.handleCompareFunction():278
> 
> org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():246
> 
> org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():58
> org.apache.drill.common.expression.FunctionHolderExpression.accept():53
> org.apache.drill.exec.expr.FilterBuilder.buildFilterPredicate():80
> 
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.getFilterPredicate():317
> org.apache.drill.exec.store.parquet.ParquetPushDownFilter.doOnMatch():150
> org.apache.drill.exec.store.parquet.ParquetPushDownFilter$2.onMatch():103
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule():319
> org.apache.calcite.plan.hep.HepPlanner.applyRule():561
> org.apache.calcite.plan.hep.HepPlanner.applyRules():420
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction():257
> org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute():127
> org.apache.calcite.plan.hep.HepPlanner.executeProgram():216
> org.apache.calcite.plan.hep.HepPlanner.findBestExp():203
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():419
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():370
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():353
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel():536
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():178
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():216
> org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan():121
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():87
> org.apache.drill.exec.work.foreman.Foreman.runSQL():593
> org.apache.drill.exec.work.foreman.Foreman.run():276
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
> 
> When i manually resend the query with TIMESTAMP as here:
> WHERE `startTime` >= TIMESTAMP '2022-02-14 00:00:00.00'
>   AND `startTime` < TIMESTAMP '2022-02-21 00:00:00.00'
> ORDER BY `startTime` DESC
> 
> Everything is fine, but superset doesnt create the query this way.
> 
> So, now to my question:
> Is this error message legit because of the missing "TIMESTAMP" before the 
> timestamp string, or do we have a problem here in drill?
> 
> Regards 
> Christian
> 
> --- Original Message ---
> Z0ltrix  schrieb am Mittwoch, 23. Februar 2022 um 
> 10:49:
> 
>> Hello superset devs,
>> 
>> we have a problem with our superset -> drill connection with time range 
>> filters.
>> 
>> When we filter a dashboard by time range (last week, month, etc.) i get an 
>> SYSTEM ERROR: ClassCastException: 
>> org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast to 
>> org.apache.drill.exec.expr.holders.TimeStampHolder
>> from drill.
>> 
>> I dont want to talk here too much about the drill error because this is a 
>> topic for the drill project, but i think we could solve this also by adding 
>> something to db_engine_specs/drill.py
>> 
>> Superset sends the following to drill:
>> WHERE `startTime` >= '202

Re: thinking of our Ukrainian friends

2022-02-24 Thread luoc


Vitalii and Vova are my Ukrainian friends, hopefully they will stay safe as 
well.

> On Feb 24, 2022, at 14:39, Ted Dunning  wrote:
> 
> For commercial historical reasons many of the people who have contributed
> to Drill live in Ukraine.
> 
> My heart is with them tonight. I hope they stay safe.



Re: [RESULT] [VOTE] Release Apache Drill 1.20.0 - RC5

2022-02-25 Thread luoc


Congratulations. Thank you all for doing everything, 1.21 will be powerful!

> On Feb 25, 2022, at 18:17, James Turton  wrote:
> 
> Tags and branches have been pushed, so I hereby unfreeze drill/master.  The 
> official release announcement will arrive once I see the download mirrors are 
> up to date.
> 
> Looking back I think we should consider making the Hadoop 2 build something 
> that is supported but must be built by end users, just because I didn't find 
> any really natural way to include it in our release process.  It was 
> certainly doable, but it's a little clunky.  Or maybe there are Maven plugin 
> secrets that I don't know... something we can discuss before the next one.
> 
>> On 2022/02/25 09:41, James Turton wrote:
>> The vote passes. Thanks to everyone who has tested the six(!) release 
>> candidates over last twenty(!) days and given their comments and votes. 
>> Final tally:
>> 
>> 3x +1 (binding): Cong Luo, Charles, James
>> 
>> 2x +1 (non-binding): Jinfeng Ni, Christian
>> 
>> No 0s or -1s.
>> 
>> I'll start process for pushing the release artifacts and send an 
>> announcement once propagated.
>> 
>> Kind regards
>> James



Re: thinking of our Ukrainian friends

2022-02-25 Thread luoc


Vitalii, please let me know if you need any assistance, the no-fly zone cannot 
stop our concerns!
And, Is Vova and Anton safe?

> On Feb 25, 2022, at 19:37, Vitalii Diravka  wrote:
> 
> We are trying to be in a safe place, but the second day from time to time
> we hear explosions near Kiev.
> I hope it will be finished soon!
> 
> Thank you all!
> 
> Kind regards
> Vitalii
> 
> 
>> On Thu, Feb 24, 2022 at 4:29 PM Charles Givre  wrote:
>> 
>> I would also like to express my sympathy and support for Arina, Vova,
>> Vitalii, Igor, Anton and the people of Ukraine.
>> -- C
>> 
>>>> On Feb 24, 2022, at 5:07 AM, James Turton  wrote:
>>> 
>>> I too would like to express my sympathy and solidarity.
>>> 
>>> On 2022/02/24 11:43, Z0ltrix wrote:
>>>> oh my goodness i hope this will end soon.
>>>> Stay safe!
>>>> 
>>>> --- Original Message ---
>>>> 
>>>> luoc  schrieb am Donnerstag, 24. Februar 2022 um
>> 10:24:
>>>> 
>>>>> Vitalii and Vova are my Ukrainian friends, hopefully they will stay
>> safe as well.
>>>>> 
>>>>>> On Feb 24, 2022, at 14:39, Ted Dunning ted.dunn...@gmail.com wrote:
>>>>>> 
>>>>>> For commercial historical reasons many of the people who have
>> contributed
>>>>>> 
>>>>>> to Drill live in Ukraine.
>>>>>> 
>>>>>> My heart is with them tonight. I hope they stay safe.
>>> 
>> 
>> 
> 



Re: I broke the Travis CI

2022-02-25 Thread luoc


Received.

> On Feb 25, 2022, at 19:50, James Turton  wrote:
> 
> It's just the CI, not a problem with Drill 1.20 on ARM (which I've tested).  
> So no need to fret when you see the red X in Travis, I'll take care of it 
> soon.
> 
> James



Re: New Committer: Tengfei Wang

2022-03-03 Thread luoc


Congratulations. 

> On Mar 3, 2022, at 20:56, James Turton  wrote:
> 
> Welcome Tengfei!
> 
>> On 2022/03/03 14:52, Charles Givre wrote:
>> The Project Management Committee (PMC) for Apache Drill
>> has invited Tengfei Wang to become a committer and we are pleased
>> to announce that he has accepted.
>> 
>> Being a committer enables easier contribution to the
>> project since there is no need to go via the patch
>> submission process. This should enable better productivity.
>> A PMC member helps manage and guide the direction of the project.
>> Please join me in congratulating Tengfei!
>> 
>> 



Re: [DISCUSS] Pull Request Cleanup

2022-03-05 Thread luoc


Hi Charles,
  I prefer to the "timeout" bot, that is a good step. However, some PR may be 
blocked by another PR, so I recommended that we would up the timeout to 120 
days or 150 days.

Thanks.

> On Mar 4, 2022, at 22:57, Charles Givre  wrote:
> 
> Hi Christian, 
> Thanks for your input.  First of all, Drill is clearly a complex system so 
> PRs do tend to take a long time to get merged.  One option might be to use a 
> bot like stale [1] which automatically closes PRs after a period of 
> inactivity. 
> 
> Personally, I'd set the "timeout" period to 90 days.
> Best,
> -- C
> 
> 
> [1]:  https://github.com/apps/stale 
> 
> 
>> On Mar 3, 2022, at 3:51 PM, Z0ltrix  wrote:
>> 
>> Hi Charles,
>> 
>> what process would you suggest?
>> 
>> I would think some devs are using a PR to keep the work open for memory 
>> and/or others can discuss it but of course, if its stale for months maybe it 
>> will never make any more progress.
>> Perhaps someone could trigger a comment and ask for further development, but 
>> who would be responsible for that trigger?
>> 
>> Regards
>> Christian
>> 
>> 
>> 
>> 
>>  Original-Nachricht 
>> Am 3. März 2022, 17:54, Charles Givre schrieb:
>> 
>> Hello all,
>> I wanted to discuss the possibility of doing a cleanup of open and stale 
>> pull requests. There seem to be about 10 PRs that are actively being worked, 
>> then we have a bunch of PRs of various stages of staleness.
>> 
>> What do you all think about having some sort of process for closing out old 
>> PRs that are not actively being worked?
>> Best,
>> -- C
>> 
>> 
> 


Re: Keynote for Drill meetup on Mar 5, 2022

2022-03-06 Thread luoc


Thank you very much, Jingchuan.

> On Mar 6, 2022, at 17:56, Jingchuan Hu  
> wrote:
> 
> Hi team,
> 
> Here is the keynote with the time-stamp, which helps you to catch up with
> the meetup for better convenience.
> 
> 0:00 - 16:37: Apache Drill community members assembled.
> 16:38 - 28:50: New Committer: Tengfei Wang introduced the realtime data
> warehouse V2.0 based on Drill which was developed and deployed in Alipay.
> 28:51 - 1:11:20: Q&A for Tengfei Wang's sharing.
> 
> The recording has been uploaded to zoom cloud by James.
> https://datadistillr.zoom.us/rec/share/GLwjAISSWyIF45ZxZUKb_JZ3cBLi6vEEGzLy8SIrzwmzkzjAJuwlUwr_h6jaLsnJ.ecfcFMA1NwTcmEd7
> Passcode: V0xT0Oc=
> 
> I uploaded it to Youtube with time-marker hyperlinks, welcome to share it
> to more people who are interested in Drill.
> https://www.youtube.com/watch?v=aHHA-Aaua2g
> 



Re: [VOTE] Adopt the Drill Test Framework from MapR

2022-03-18 Thread luoc
Hi all,

If the new repository is created, I hope this is the new progress with 
DRILL-8120 .

Then, can test framework contributors answer the following questions?

1. Who are the active contributors to this test framework?

2. Are we planning to add more contributors (and how do we implement it)?

3. Can the contributors outline the design idea and function set of the test 
framework here?

Again, thanks to Charles's company and HPE team for supporting Apache Drill.

Thanks.

> 2022年3月18日 下午12:56,Paul Rogers  写道:
> 
> Abhishek used to have that thing running like a charm. Great to see it
> getting attention again.
> 
> +1
> 
> - Paul
> 
> On Thu, Mar 17, 2022 at 2:03 AM James Turton  wrote:
> 
>> Hi dev community!
>> 
>> Many of you need no introduction to the test framework developed by MapR
>> 
>> https://github.com/mapr/drill-test-framework
>> 
>> . For those who don't know, the test framework contains around 10k tests
>> often exercising scenarios not covered by Drill's unit tests. Just weeks
>> ago it revealed a regression in a Drill 1.20 RC and saved us from
>> shipping with that bug. The linked repository has been dormant for going
>> on two years but I am aware of bits of work that have been done on the
>> test framework since, and today Anton is actively dusting off and
>> updating it. Since the codebase is under the Apache 2.0 license, we are
>> free to bring a copy into the Drill project. I've created a new
>> (currently empty) possible home for the test framework at
>> 
>> https://github.com/apache/drill-test-framework
>> 
>> Before I proceed to push a clone there, please vote if you support or
>> oppose our adoption of the test framework.
>> 
>> P.S. I have also sent a message to a contact at HPE just in case they
>> might be aware of some concern applicable to our copying this repo but,
>> given the license applied, I cannot see that there will be be one.
>> Should anything get raised (and we'd decided to proceed) I would, of
>> course, pause so that we can discuss.
>> 
>> Regards
>> James
>> 



Re: Keynote for Drill meetup on April 1, 2022

2022-04-06 Thread luoc


Great. Welcome the integration test framework. Also, Thank Anton.

> On Apr 4, 2022, at 17:40, Jingchuan Hu  
> wrote:
> 
> Anton



[DISCUSS] Add schema support for the XML format

2022-04-06 Thread luoc

Hello dear driller,

Before starting the topic, I would like to do a simple survey :

1. Did you know that Drill already supports XML format?

2. If yes, what is the maximum size for the XML files you normally read? 1MB, 
10MB or 100MB

3. Do you expect that reading XML will be as easy as JSON (Schema Discovery)?

Thank you for responding to those questions.

XML is different from the JSON file, and if we rely solely on the Drill drive 
to deduce the structure of the data. (or called SCHEMA), the code will get very 
complex and delicate.

For example, inferring array structure and numeric range. So, "provided schema" 
or "TO_JSON" may be good medicine :

Provided Schema

We can add the DTD or XML Schema (XSD) support for the XML. It can build all 
value vectors (Writer) before reading data, solving the fields, types, and 
complex nested.

However, a definition file is actually a rule validator that allows elements to 
appear 0 or more times. As a result, it is not possible to know if all elements 
exist until the data is read.

Therefore, avoid creating a large number of value vectors that do not actually 
exist before reading the data.

We can build the top schema at the initial stage and add new value vectors as 
needed during the reading phase.

TO_JSON

Read and convert XML directly to JSON, using the JSON Reader for data 
resolution.

It makes it easier for us to query the XML data such as JSON, but requires 
reading the whole XML file in memory.

I think the two can be done, so I look forward to your spirited discussion.

Thanks.

- luoc


Re: Next Version

2023-12-10 Thread luoc
Hello all,
  1.22 will be a more stable version. This is a digression: Is Paul still 
interested in participating in the EVF V2 refactoring in the framework? I would 
like to offer time to assist him.

luoc

> 2023年12月9日 01:01,Charles Givre  写道:
> 
> Hello all, 
> Happy Friday everyone!   I wanted to raise the topic of getting a Drill minor 
> release out the door before the end of the year.   My opinion is that I'd 
> really like to release Drill 1.22 once the integration with Apache Daffodil 
> is complete, but it sounds like that is still a few weeks away.  
> 
> What does everyone think about issuing a maintenance release before the end 
> of the year?  There are a number of singificant fixes including some security 
> updates and a major bug in the ES plugin that basically makes it unusable.
> Best,
> -- C



Re: Next Version

2024-01-02 Thread luoc
>> Drill, Windows support, rapid installation and setup, low "time to insight".
>>> 
>>> I'm not going so far as to suggest that Drill be thought of as desktop
>> software, rather that ad hoc Drill deployments working on small (Gb) to big
>> (Tb) data may be as, or more, important than long lived, heavily
>> integrated, professionally managed deployments working on really Big data
>> (Pb). Perhaps the last category belongs almost entirely to BigQuery,
>> Athena, Snowflake and the like nowadays anyway.
>>> 
>>> I still think a cluster is the often the most effective way to deploy
>> Drill so the question contemplated is really "Can we make it faster and
>> easier to spin up a cluster (and embedded Drill), connect to data sources
>> and start running (successful) queries"?
>>> 
>>> On 2024/01/01 07:33, James Turton wrote:
>>>> P.S. I also have an admittedly vague idea about deprecating the UNION
>> data type, which still breaks things in many operators, in favour of a
>> different approach where we kick any invalid data encountered while loading
>> column FOO out to a generated _FOO_EXCEPTIONS VARCHAR (or VARBINARY, though
>> binary data formats tend not to be malformed?) column. This would let a
>> query over dirty data complete without invisible data swallowing, and would
>> mean we could cut further effort on UNION support.
>>>> 
>>>> On 2024/01/01 07:11, James Turton wrote:
>>>>> Happy New Year!
>>>>> 
>>>>> Here's another two cents. Make that five now that I scan this email
>> again!
>>>>> 
>>>>> Excluding our Docker Hub images (which are popular), Drill is
>> downloaded ~1000 times a month [1] (order of magnitude, it's hard to count
>> genuinely new installations from web server downloads).
>>>>> 
>>>>> What roles are these folks in? I'm a data engineer by day and I don't
>> think that we count for a large share of those downloads. The DEs I work
>> with are risk averse sorts that tend to favour setups with rigid schemas
>> early on and no surprises for their users at query time. Add to that a
>> second stat from the download data: the biggest single download user OS is
>> Windows, at about 50% [1]. Some of these users may go on to copy that
>> download to a server environment but I have a theory that many of them go
>> on to run embedded Drill right there on beefy Windows laptops.
>>>>> 
>>>>> I conjecture that most of the people reaching for Drill are analysts
>> or developers working _away_ from an established, shared data
>> infrastructure. There may not be any shared data engineering where they
>> are, or they may find themselves in a fashionable "Data Mesh" environment
>> [2]. I'm probably abusing Data Mesh a bit here in that I'm told that it
>> mainly proposes a federation of distinct data _teams_, rather than of data
>> _systems_ but, if you entertain my cynical formulation of "Data Mesh guys!
>> Silos aren't uncool any more!" just a bit, then you can well imagine why a
>> user in a Data Mesh might look for something like Drill to combine data
>> from different silos on their own machine. Tangentially this suggests to me
>> that we should keep putting effort into: embedded Drill, Windows support,
>> rapid installation and setup, low "time to insight".
>>>>> 
>>>>> MongoDB questions still come up frequently giving a reason beyond the
>> JSON files questions to think that the JSON data model is still very
>> important. Wherever we decide to bound the current EVF v2 data model
>> implementation, maybe we can sketch out a design of whatever is
>> unimplemented in some updates to the Drill wiki pages? This would give
>> other devs a head start if we decide that some unsupported complex data
>> type is worth implementing down the road?
>>>>> 
>>>>> 1. https://infra-reports.apache.org/#downloads&project=drill
>>>>> 2. https://martinfowler.com/articles/data-mesh-principles.html
>>>>> 
>>>>> Regards
>>>>> James
>>>>> 
>>>>> On 2024/01/01 03:16, Charles Givre wrote:
>>>>>> I'll throw my .02 here...  As a user of Drill, I've only had the
>> occasion to use the Union once. However, when I used it, it consumed so
>> much memory, we ended up finding a workaround anyway and stopped using it.
>> Honestly, since we improved the implicit casting rules, I thi

I want to subscribe to you

2020-02-04 Thread luoc
I want to subscribe to you

Access request to dev and users slack channel

2019-07-21 Thread luoc
Hello,


I am interested to start to make contributions and I want to request access
to the Drill slack channels for the email address
luocoo...@qq.com.


Thank you.