Hello,
One question,
How to verify whether predicate pushdown is happening ?
I have one parquet file generated using CTAS command. I have executed
REFRESH METADATA. I am firing a simple query with a WHERE clause. In the
physical plan for the scan operation, i see rowcount as total number of
rows
gt; have to revisit the kinds of analytics you would like your end users to
> have. Which again raises the question-what kinds of analytics truly
> generate value for the BI user?
>
> Best,
> Saurabh
>
> On Wed, Oct 18, 2017 at 10:26 PM, PROJJWAL SAHA
> wrote:
>
> >
Hi,
Is there any public performance benchmark that users have achieved using
Drill in production scenarios ? It would be useful if someone can pass me
any links for customer user stories.
Regards
Did Parth's suggestion of
> store.parquet.reader.pagereader.bufferedread=false
> resolve the issue?
>
> Also share the details of the hardware setup... #nodes, Hadoop version,
> etc.
>
>
> -----Original Message-
> From: PROJJWAL SAHA [mailto:proj.s...@gmail.com]
&
iggers this?
>
> You can also try turning off the buffering reader.
>store.parquet.reader.pagereader.bufferedread=false
>
> With async reader on and buffering off, you might not see any degradation
> in performance in most cases.
>
>
>
> On Thu, Oct 12, 2017 at 2:08
ava.nio.Buffer.checkBounds(Buffer.java:567) ~[na:1.8.0_121]
at java.nio.ByteBuffer.put(ByteBuffer.java:827) ~[na:1.8.0_121]
at java.nio.DirectByteBuffer.put(DirectByteBuffer.java:379)
~[na:1.8.0_121]
at
org.apache.parquet.hadoop.util.CompatibilityUtil.getBuf(CompatibilityUtil.j
you try disabling async parquet reader to see if problem gets resolved.
>
>
> alter session set `store.parquet.reader.pagereader.async`=false;
>
> Thanks,
>
> Arjun
>
>
>
> From: PROJJWAL SAHA
> Sent: Wednesday, October 11, 2017 2:
I get below exception when querying parquet data on Oracle Storage Cloud
service.
Any pointers on what does this point to ?
Regards,
Projjwal
ERROR o.a.d.e.u.f.BufferedDirectBufInputStream - Error reading from stream
part-6-25a9ae4b-fd9e-4770-b17e-9a29b270a4c2.parquet. Error was : null
2017-
`data25Goct6/websales` ?
>
> Thanks
> Padma
>
>
> On Oct 9, 2017, at 5:50 AM, PROJJWAL SAHA mailto:pr
> oj.s...@gmail.com>> wrote:
>
> Hello all,
>
> I am getting the below exception when querying parquet data stored in
> storage cloud service.What does this e
Hello all,
I am getting the below exception when querying parquet data stored in
storage cloud service.What does this exception point to ?
The query on the same parquet files works when they are stored in
alluxio.which means the data is fine.
I am using drill 11.1
Any help is appreciated !
Regar
and put it in the classpath of drill to enable me to debug the code
at runtime.
Please help me here.
Regards,
Projjwal
On Sun, Mar 19, 2017 at 10:43 PM, PROJJWAL SAHA wrote:
> Hi all,
>
> I am trying to debug a 3rd party storage plugin and I need to enable debug
> with my ecli
Hi all,
I am trying to debug a 3rd party storage plugin and I need to enable debug
with my eclipse IDE. Can someone pls guide me on the steps to enable
debugging for eclipse - any documentation / link would also help. Also are
the steps same if I would want to debug drill codebase ?
Regards,
Proj
ree million rows is too many rows, for sqlline to print.
>
> Try doing a COUNT(*) and see if that query returns the correct count on
> that table.
>
>
> Thanks,
>
> Khurram
>
> ____
> From: PROJJWAL SAHA
> Sent: Wednesda
All,
I am using drillconf from command line to display a query result like
select * from xxx
having 3 million rows. The screen display scrolls fast to display the
result, however, it stops after some time with this exception -
java.lang.NegativeArraySizeException
at
org.apache.drill.exec.vector.V
All,
one question
i am querying on .gz.parquet files.
select * from xxx returns data like
+-+
| current |
+-+
|
{"vendor_id":"VTS","pickup_datetime":"ACj75+tEAAAvfSUA","payment_type":"CSH","fare_amount":12.0,"mta_tax":0.5,"tip_amount":0.0,"tolls_amount":5.33,"total_amount":18.33,"r
e next thing I would try is to make your cluster a single node
> cluster first and then run the same explain plan query separately on each
> individual file.
>
>
>
> On Mar 7, 2017 5:09 AM, "PROJJWAL SAHA" wrote:
>
> > Hi Rahul,
> >
> > thanks for your
dfs storage plugin.
Query planning time is approx 30 secs
Query execution time is apprx 1.5 secs
Regards,
Projjwal
-- Forwarded message --
From: PROJJWAL SAHA
Date: Fri, Mar 3, 2017 at 5:06 PM
Subject: Minimise query plan time for dfs plugin for local file system on
tsv file
To
having some effect on the planning...
>
> On Fri, Mar 3, 2017 at 6:08 AM, PROJJWAL SAHA wrote:
>
> > I did not change the default values used by drill.
> > Are you talking of changing planner.memory_limit
> > and planner.memory.max_query_memory_per_node ?
> > If
you set for planner ?
>
> On Fri, Mar 3, 2017 at 5:06 PM, PROJJWAL SAHA wrote:
>
> > Hello all,
> >
> > I am quering select * from dfs.xxx where yyy (filter condition)
> >
> > I am using dfs storage plugin that comes out of the box from drill on a
> > 1
Hello all,
I am quering select * from dfs.xxx where yyy (filter condition)
I am using dfs storage plugin that comes out of the box from drill on a 1GB
file, local to the drill cluster.
The 1GB file is split into 10 files of 100 MB each.
As expected I see 11 minor and 2 major fagments.
The drill c
Hello,
I am doing select * query on a csv file of 1 GB with a 5 node drill
cluster. The csv file is stored in another storage cluster within the
enterprise.
In the query profile, I see one major fragment and within the major
fragment, I see only 1 minor fragment. The hostname for the minor fragme
egion.
>
> In either case, from AWS console you can figure out how much network
> throughput you are getting if that is the bottleneck
> Also drill machines would need CPU so along with 32GB memory if you have 8
> cores that would be desirable
>
> On Tue, Feb 21, 2017 at 2:17 PM
hink majority of the time is spent on displaying the result set instead
> of querying the file if the drill server is on aws.
> If the drill server is local then it might be your network which might take
> a lot of time based on s3 bucket location and where your drill server is
>
> O
Hello all,
I am using 1GB data in the form of .tsv file, stored in Amazon S3 using
Drill 1.8. I am using default configurations of Drill using S3 storage
plugin coming out of the box. The drill bits are configured on a 5 node
cluster with 32GB RAM and 4VCPU.
I see that select * from xxx; query ta
On Mon, Feb 20, 2017 at 5:29 PM, PROJJWAL SAHA wrote:
> I am using 1GB data in the form of .tsv file, stored in Amazon S3 using
> Drill 1.8. I am using default configurations of Drill using S3 storage
> plugin coming out of the box. The drill bits are configured on a 5 node
> clust
25 matches
Mail list logo