Question Regarding How to search excel file using drill.

2016-02-26 Thread Sanjiv Kumar C
Hello
   I want to know how to query from excel file(.xslx) and ms access
file using drill.

-- 
Thanks & Regards
  * Sanjiv Kumar*


Re: Question Regarding How to search excel file using drill.

2016-02-26 Thread Andries Engelbrecht
You can use the Drill ODBC driver, and then query Drill using MS Query or the 
same way you would query another DB engine through an ODBC DSN.

--Andries

> On Feb 26, 2016, at 2:09 AM, Sanjiv Kumar C  wrote:
> 
> Hello
>   I want to know how to query from excel file(.xslx) and ms access
> file using drill.
> 
> -- 
> Thanks & Regards
>  * Sanjiv Kumar*



Re: Question Regarding How to search excel file using drill.

2016-02-26 Thread Rajkumar Singh
-Sanjeev
This can help you to setup drill with MS excel using Drill-ODBC.
 http://rajkrrsingh.blogspot.in/2015/11/apache-drill-run-query-through.html 


> On Feb 26, 2016, at 10:12 PM, Andries Engelbrecht  
> wrote:
> 
> You can use the Drill ODBC driver, and then query Drill using MS Query or the 
> same way you would query another DB engine through an ODBC DSN.
> 
> --Andries
> 
>> On Feb 26, 2016, at 2:09 AM, Sanjiv Kumar C  wrote:
>> 
>> Hello
>>  I want to know how to query from excel file(.xslx) and ms access
>> file using drill.
>> 
>> -- 
>> Thanks & Regards
>> * Sanjiv Kumar*
> 



Re: The praises for Drill

2016-02-26 Thread Parth Chandra
Welcome back Edmon, and thanks for the praise :). Hope to see you on the
next hangout.

On Thu, Feb 25, 2016 at 7:27 PM, Edmon Begoli  wrote:

> Hello fellow Driilers,
>
> I have been inactive on the development side of the project, as we got busy
> being heavy/power users of the Drill in the last few months.
>
> I just want to share some great experiences with the latest versions of
> Drill.
>
> Just tonight, as we were scrambling to meet the deadline, we were able to
> query two years of flat psv files of claims/billing and clinical data in
> Drill in less than 60 seconds.
>
> No ETL, no warehousing - just plain SQL against tons of files. Run SQL, get
> results.
>
> Amazing!
>
> We have also done some much more important things too, and we had a paper
> accepted to Big Data Services about the experiences. The co-author of the
> paper is Drill's own Dr. Ted Dunning :-)
> I will share it once it is published.
>
> Anyway, cheers to all, and hope to re-join the dev activities soon.
>
> Best,
> Edmon
>


Re: The praises for Drill

2016-02-26 Thread Abdel Hakim Deneche
Looking forward to reading the paper!

On Fri, Feb 26, 2016 at 10:19 AM, Parth Chandra  wrote:

> Welcome back Edmon, and thanks for the praise :). Hope to see you on the
> next hangout.
>
> On Thu, Feb 25, 2016 at 7:27 PM, Edmon Begoli  wrote:
>
> > Hello fellow Driilers,
> >
> > I have been inactive on the development side of the project, as we got
> busy
> > being heavy/power users of the Drill in the last few months.
> >
> > I just want to share some great experiences with the latest versions of
> > Drill.
> >
> > Just tonight, as we were scrambling to meet the deadline, we were able to
> > query two years of flat psv files of claims/billing and clinical data in
> > Drill in less than 60 seconds.
> >
> > No ETL, no warehousing - just plain SQL against tons of files. Run SQL,
> get
> > results.
> >
> > Amazing!
> >
> > We have also done some much more important things too, and we had a paper
> > accepted to Big Data Services about the experiences. The co-author of the
> > paper is Drill's own Dr. Ted Dunning :-)
> > I will share it once it is published.
> >
> > Anyway, cheers to all, and hope to re-join the dev activities soon.
> >
> > Best,
> > Edmon
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  


Now Available - Free Hadoop On-Demand Training



Re: Avro support in Drill - Missing support for the IN operator and other frustrating things

2016-02-26 Thread Jason Altekruse
Stefan,

I'm sorry that we have not been better about getting back to the issues you
have filed against the Avro reader. We do appreciate all of the effort you
have put into filing thorough bugs and being active in the discussions on
the list. I have responded on the bug you filed on this issue [1] with a
workaround and will be posting a patch shortly with a fix.

- Jason 

[1] - https://issues.apache.org/jira/browse/DRILL-4441


On Thu, Feb 25, 2016 at 12:29 PM, Stefán Baxter 
wrote:

> Hi,
>
> This query targets Avro files in the latest 1.5 release:
>
> 0: jdbc:drill:zk=local> select count(*) from
> dfs.asa.`/streaming/venuepoint/transactions/` as s where s.sold_to =
> 'Customer/4-2492847';
> +-+
> | EXPR$0  |
> +-+
> | 5788|
> +-+
>
> 0: jdbc:drill:zk=local> select count(*) from
> dfs.asa.`/streaming/venuepoint/transactions/` as s where s.sold_to IN
> ('Customer/4-2492847');
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
>
> It shows that the IN operator does not work with Avro (works with Parquet).
>
> This finally tips us over. We have invested hundreds of hours moving all
> streaming/fresh data from JSON to Avro but the Avro part of Drill is broken
> in too many ways to recommend its use to anyone.
>
> Attempts to report Avro errors and shortcomings, like the missing support
> for dirX, has had no results.
>
> I think it would be prudent to warn people on the Drill website that the
> Avro support is experimental, at best
>
> - Stefán Baxter
>


extractHeader in session variable?

2016-02-26 Thread Christopher Matta
Is it possible to set the extractHeader option for CSV/TSV in a session
variable? Doing it on the format type is just too broad sometimes and I'd
like to be able to set it based on the files I'm querying.


Chris Matta
cma...@mapr.com
215-701-3146


RE: Drill error with large sort

2016-02-26 Thread Paul Friedman
Thanks for this, these parameters fixed it!

---Paul


-Original Message-
From: Abdel Hakim Deneche [mailto:adene...@maprtech.com]
Sent: Thursday, February 25, 2016 5:32 PM
To: user 
Subject: Re: Drill error with large sort

Not so short answer:

In Drill 1.5 (I assume you are using 1.5) we have an improved allocator that
better tracks how much memory each operator is using. In your case it seems
that the date has very wide columns that are causing Sort to choke on the
very first batch of data (1024 records taking up 224MB!!!) because it's way
more than it's memory limit (around 178MB in your particular case).
Drill uses a fancy equation to compute this limit and increasing the
aforementioned option will increase the sort limit. More details here:

http://drill.apache.org/docs/configuring-drill-memory/

On Thu, Feb 25, 2016 at 5:26 PM, Abdel Hakim Deneche 
wrote:

> Short answer:
>
> increase the value of planner.memory.max_query_memory_per_node, by
> default it's set to 2GB, try setting to 4 or even 8GB. This should get
> the query to pass.
>
> On Thu, Feb 25, 2016 at 5:24 PM, Jeff Maass  wrote:
>
>>
>> If you are open to changing the query:
>>   # try removing the functions on the 5th column
>>   # is there any way you could further limit the query?
>>   # does the query finish if u add a limit / top clause?
>>   # what do the logs say?
>>
>> 
>> From: Paul Friedman 
>> Sent: Thursday, February 25, 2016 7:07:12 PM
>> To: user@drill.apache.org
>> Subject: Drill error with large sort
>>
>> I’ve got a query reading from a large directory of parquet files (41
>> GB) and I’m consistently getting this error:
>>
>>
>>
>> Error: RESOURCE ERROR: One or more nodes ran out of memory while
>> executing the query.
>>
>>
>>
>> Unable to allocate sv2 for 1023 records, and not enough batchGroups
>> to spill.
>>
>> batchGroups.size 0
>>
>> spilledBatchGroups.size 0
>>
>> allocated memory 224287987
>>
>> allocator limit 178956970
>>
>> Fragment 0:0
>>
>>
>>
>> [Error Id: 878d604c-4656-4a5a-8b46-ff38a6ae020d on
>> chai.dev.streetlightdata.com:31010] (state=,code=0)
>>
>>
>>
>> Direct memory is set to 48GB and heap is 8GB.
>>
>>
>>
>> The query is:
>>
>>
>>
>> select probe_id, provider_id, is_moving, mode,
>> cast(convert_to(points,
>> 'JSON') as varchar(1))
>>
>> from dfs.`/home/paul/data`
>>
>> where
>>
>> start_lat between 24.4873780449008 and 60.0108911181433 and
>>
>> start_lon between -139.065890469841 and -52.8305074899881 and
>>
>> provider_id = '343' and
>>
>> mod(abs(hash(probe_id)),  100) = 0
>>
>> order by probe_id, start_time;
>>
>>
>>
>> I’m also using the “example” drill-override configuration.
>>
>>
>>
>> Any help would be appreciated.
>>
>>
>>
>> Thanks.
>>
>>
>>
>> ---Paul
>>
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   
>
>
> Now Available - Free Hadoop On-Demand Training
>  m_campaign=Free%20available>
>



-- 

Abdelhakim Deneche

Software Engineer

  


Now Available - Free Hadoop On-Demand Training