How to verify predicate pushdown

2017-10-26 Thread PROJJWAL SAHA
Hello, One question, How to verify whether predicate pushdown is happening ? I have one parquet file generated using CTAS command. I have executed REFRESH METADATA. I am firing a simple query with a WHERE clause. In the physical plan for the scan operation, i see rowcount as total number of rows

Re: Benchmark numbers using Drill

2017-10-20 Thread PROJJWAL SAHA
gt; have to revisit the kinds of analytics you would like your end users to > have. Which again raises the question-what kinds of analytics truly > generate value for the BI user? > > Best, > Saurabh > > On Wed, Oct 18, 2017 at 10:26 PM, PROJJWAL SAHA > wrote: > > >

Benchmark numbers using Drill

2017-10-18 Thread PROJJWAL SAHA
Hi, Is there any public performance benchmark that users have achieved using Drill in production scenarios ? It would be useful if someone can pass me any links for customer user stories. Regards

Re: Exception while reading parquet data

2017-10-16 Thread PROJJWAL SAHA
Did Parth's suggestion of > store.parquet.reader.pagereader.bufferedread=false > resolve the issue? > > Also share the details of the hardware setup... #nodes, Hadoop version, > etc. > > > -----Original Message- > From: PROJJWAL SAHA [mailto:proj.s...@gmail.com] &

Re: Exception while reading parquet data

2017-10-15 Thread PROJJWAL SAHA
iggers this? > > You can also try turning off the buffering reader. >store.parquet.reader.pagereader.bufferedread=false > > With async reader on and buffering off, you might not see any degradation > in performance in most cases. > > > > On Thu, Oct 12, 2017 at 2:08

Re: Exception while reading parquet data

2017-10-12 Thread PROJJWAL SAHA
ava.nio.Buffer.checkBounds(Buffer.java:567) ~[na:1.8.0_121] at java.nio.ByteBuffer.put(ByteBuffer.java:827) ~[na:1.8.0_121] at java.nio.DirectByteBuffer.put(DirectByteBuffer.java:379) ~[na:1.8.0_121] at org.apache.parquet.hadoop.util.CompatibilityUtil.getBuf(CompatibilityUtil.j

Re: Exception while reading parquet data

2017-10-12 Thread PROJJWAL SAHA
you try disabling async parquet reader to see if problem gets resolved. > > > alter session set `store.parquet.reader.pagereader.async`=false; > > Thanks, > > Arjun > > > > From: PROJJWAL SAHA > Sent: Wednesday, October 11, 2017 2:

Exception while reading parquet data

2017-10-11 Thread PROJJWAL SAHA
I get below exception when querying parquet data on Oracle Storage Cloud service. Any pointers on what does this point to ? Regards, Projjwal ERROR o.a.d.e.u.f.BufferedDirectBufInputStream - Error reading from stream part-6-25a9ae4b-fd9e-4770-b17e-9a29b270a4c2.parquet. Error was : null 2017-

Re: Exception when querying parquet data

2017-10-11 Thread PROJJWAL SAHA
`data25Goct6/websales` ? > > Thanks > Padma > > > On Oct 9, 2017, at 5:50 AM, PROJJWAL SAHA mailto:pr > oj.s...@gmail.com>> wrote: > > Hello all, > > I am getting the below exception when querying parquet data stored in > storage cloud service.What does this e

Exception when querying parquet data

2017-10-09 Thread PROJJWAL SAHA
Hello all, I am getting the below exception when querying parquet data stored in storage cloud service.What does this exception point to ? The query on the same parquet files works when they are stored in alluxio.which means the data is fine. I am using drill 11.1 Any help is appreciated ! Regar

Re: Enable debugging for 3rd party storage plugin with eclipse

2017-03-23 Thread PROJJWAL SAHA
and put it in the classpath of drill to enable me to debug the code at runtime. Please help me here. Regards, Projjwal On Sun, Mar 19, 2017 at 10:43 PM, PROJJWAL SAHA wrote: > Hi all, > > I am trying to debug a 3rd party storage plugin and I need to enable debug > with my ecli

Enable debugging for 3rd party storage plugin with eclipse

2017-03-19 Thread PROJJWAL SAHA
Hi all, I am trying to debug a 3rd party storage plugin and I need to enable debug with my eclipse IDE. Can someone pls guide me on the steps to enable debugging for eclipse - any documentation / link would also help. Also are the steps same if I would want to debug drill codebase ? Regards, Proj

Re: Display of query result using command line

2017-03-15 Thread PROJJWAL SAHA
ree million rows is too many rows, for sqlline to print. > > Try doing a COUNT(*) and see if that query returns the correct count on > that table. > > > Thanks, > > Khurram > > ____ > From: PROJJWAL SAHA > Sent: Wednesda

Display of query result using command line

2017-03-15 Thread PROJJWAL SAHA
All, I am using drillconf from command line to display a query result like select * from xxx having 3 million rows. The screen display scrolls fast to display the result, however, it stops after some time with this exception - java.lang.NegativeArraySizeException at org.apache.drill.exec.vector.V

Query on .gz.parquet files

2017-03-09 Thread PROJJWAL SAHA
All, one question i am querying on .gz.parquet files. select * from xxx returns data like +-+ | current | +-+ | {"vendor_id":"VTS","pickup_datetime":"ACj75+tEAAAvfSUA","payment_type":"CSH","fare_amount":12.0,"mta_tax":0.5,"tip_amount":0.0,"tolls_amount":5.33,"total_amount":18.33,"r

Re: Minimise query plan time for dfs plugin for local file system on tsv file

2017-03-07 Thread PROJJWAL SAHA
e next thing I would try is to make your cluster a single node > cluster first and then run the same explain plan query separately on each > individual file. > > > > On Mar 7, 2017 5:09 AM, "PROJJWAL SAHA" wrote: > > > Hi Rahul, > > > > thanks for your

Fwd: Minimise query plan time for dfs plugin for local file system on tsv file

2017-03-06 Thread PROJJWAL SAHA
dfs storage plugin. Query planning time is approx 30 secs Query execution time is apprx 1.5 secs Regards, Projjwal -- Forwarded message -- From: PROJJWAL SAHA Date: Fri, Mar 3, 2017 at 5:06 PM Subject: Minimise query plan time for dfs plugin for local file system on tsv file To

Re: Minimise query plan time for dfs plugin for local file system on tsv file

2017-03-05 Thread PROJJWAL SAHA
having some effect on the planning... > > On Fri, Mar 3, 2017 at 6:08 AM, PROJJWAL SAHA wrote: > > > I did not change the default values used by drill. > > Are you talking of changing planner.memory_limit > > and planner.memory.max_query_memory_per_node ? > > If

Re: Minimise query plan time for dfs plugin for local file system on tsv file

2017-03-03 Thread PROJJWAL SAHA
you set for planner ? > > On Fri, Mar 3, 2017 at 5:06 PM, PROJJWAL SAHA wrote: > > > Hello all, > > > > I am quering select * from dfs.xxx where yyy (filter condition) > > > > I am using dfs storage plugin that comes out of the box from drill on a > > 1

Minimise query plan time for dfs plugin for local file system on tsv file

2017-03-03 Thread PROJJWAL SAHA
Hello all, I am quering select * from dfs.xxx where yyy (filter condition) I am using dfs storage plugin that comes out of the box from drill on a 1GB file, local to the drill cluster. The 1GB file is split into 10 files of 100 MB each. As expected I see 11 minor and 2 major fagments. The drill c

Distribution of workload across nodes in a cluster

2017-02-22 Thread PROJJWAL SAHA
Hello, I am doing select * query on a csv file of 1 GB with a 5 node drill cluster. The csv file is stored in another storage cluster within the enterprise. In the query profile, I see one major fragment and within the major fragment, I see only 1 minor fragment. The hostname for the minor fragme

Re: Query on performance using Drill and Amazon s3.

2017-02-21 Thread PROJJWAL SAHA
egion. > > In either case, from AWS console you can figure out how much network > throughput you are getting if that is the bottleneck > Also drill machines would need CPU so along with 32GB memory if you have 8 > cores that would be desirable > > On Tue, Feb 21, 2017 at 2:17 PM

Re: Query on performance using Drill and Amazon s3.

2017-02-21 Thread PROJJWAL SAHA
hink majority of the time is spent on displaying the result set instead > of querying the file if the drill server is on aws. > If the drill server is local then it might be your network which might take > a lot of time based on s3 bucket location and where your drill server is > > O

Query on performance using Drill and Amazon s3.

2017-02-20 Thread PROJJWAL SAHA
Hello all, I am using 1GB data in the form of .tsv file, stored in Amazon S3 using Drill 1.8. I am using default configurations of Drill using S3 storage plugin coming out of the box. The drill bits are configured on a 5 node cluster with 32GB RAM and 4VCPU. I see that select * from xxx; query ta

Re: Query on performance using Drill and Amazon S3

2017-02-20 Thread PROJJWAL SAHA
On Mon, Feb 20, 2017 at 5:29 PM, PROJJWAL SAHA wrote: > I am using 1GB data in the form of .tsv file, stored in Amazon S3 using > Drill 1.8. I am using default configurations of Drill using S3 storage > plugin coming out of the box. The drill bits are configured on a 5 node > clust