Re: Drill Capacity

2017-11-02 Thread Paul Rogers
Hi Yun,

I’m going to give you multiple ways to understand the issue based on the 
information you’ve provided. I generally like to see the full logs to diagnose 
such problems, but we’ll start with what you’ve provided thus far.
 
How large is each record in your file? How many fields? How many bytes? 
(Alternatively, how big is a single input file and how many records does it 
contain?)

You mention the limit of 64K columns in CSV. This makes me wonder if you have a 
“jumbo” record. If each individual record is large, then there won’t be enough 
space in the sort to take even a single batch of records, and you’ll get the 
sv2 error that you saw.

We can guess the size, however, from the info you provided:

batchGroups.size 1
spilledBatchGroups.size 0
allocated memory 42768000
allocator limit 41943040

This says you have a batch in memory and are trying to allocate some memory 
(the “sv2”). The allocated memory number tells us that each batch size is 
probably ~43 MB. But, the sort only has 42 MB to play with. The sort needs at 
least two batches in memory to make progress, hence the out-of-memory errors.

It would be nice to confirm this from the logs, but unfortunately, Drill does 
not normally log the size of each batch. As it turns out, however, the 
“managed” version that Boaz mentioned added more logging around this problem: 
it will tell you how large it thinks each batch is, and will warn if you have, 
say, a 43 MB batch but only 42 MB in which to sort.

(If you do want to use the “managed” version of the sort, I suggest you try 
Drill 1.12 when it is released as that version contains additional fixes to 
handle constrained memory.)

Also, at present, The JSON record reader loads 4096 records into each batch. If 
your file has at least that many records, then we can guess each record is 
about 43 MB / 4096 =~ 10K in size. (You can confirm, as noted above, by 
dividing total file size by record count.)

We are doing work to handle such large batches, but the work is not yet 
available in a release. Unfortunately, in the meanwhile, we also don’t let you 
control the batch size. But, we can provide another solution.

Let's explain why the message you provided said that the “allocator limit” was 
42 MB. Drill does the following to allocate memory to the sort:

* Take the “max query memory per node” (default of 2 GB regardless of actual 
direct memory),
* Divide by the number of sort operators in the plan (as shown in the 
visualized query profile)
* Divide by the “planner width” which is, by default, 70% of the number of 
cores on your system.

In your case, if you are using the default 2 GB total, but getting 41 MB per 
sort, the divisor is 50. Maybe you have 2 sorts and 32 cores? (2 * 32 * 70% =~ 
45.) Or some other combination.

We can’t reduce the number of sorts; that’s determined by your query. But, we 
can play with the other numbers.

First, we can increase the memory per query:

ALTER SESSION SET `planner.memory.max_query_memory_per_node` = 4,294,967,296

That is, 4 GB. This obviously means you must have at least 6 GB of direct 
memory; more is better.

And/or, we can reduce the number of fragments:

ALTER SESSION SET `planner.width.max_per_node` = 

The value is a bit tricky. Drill normally creates a number of fragments equal 
to 70% of the number of CPUs on your system. Let’s say you have 32 cores. If 
so, change the max_per_node to, say, 10 or even 5. This will mean fewer sorts 
and so more memory per sort, helping compensate for the “jumbo” batches in your 
query. Pick a number based on your actual number of cores.

As an alternative, as Ted suggested, you could create a larger number of 
smaller files as this would solve the batch size problem while also getting the 
parallelization benefits that Kunal mentioned.

That is three separate possible solutions. Try them one by one or (carefully) 
together.

- Paul

>> On 11/2/17, 12:31 PM, "Yun Liu"  wrote:
>> 
>>Hi Kunal and Andries,
>> 
>>Thanks for your reply. We need json in this case because Drill only
>> supports up to 65536 columns in a csv file.


Re: Drill Capacity

2017-11-02 Thread Ted Dunning
What happens if you split your large file into 5 smaller files?



On Thu, Nov 2, 2017 at 12:52 PM, Yun Liu  wrote:

> Yes- I increased planner.memory.max_query_memory_per_node to 10GB
> HEAP to 12G
> Direct memory to 16G
> And Perm to 1024M
>
> It didn't have any schema changes. As with the same file format but less
> data- it works perfectly ok. I am unable to tell if there's corruption.
>
> Yun
>
> -Original Message-
> From: Andries Engelbrecht [mailto:aengelbre...@mapr.com]
> Sent: Thursday, November 2, 2017 3:35 PM
> To: user@drill.apache.org
> Subject: Re: Drill Capacity
>
> What memory setting did you increase? Have you tried 6 or 8GB?
>
> How much memory is allocated to Drill Heap and Direct memory for the
> embedded Drillbit?
>
> Also did you check the larger document doesn’t have any schema changes or
> corruption?
>
> --Andries
>
>
>
> On 11/2/17, 12:31 PM, "Yun Liu"  wrote:
>
> Hi Kunal and Andries,
>
> Thanks for your reply. We need json in this case because Drill only
> supports up to 65536 columns in a csv file. I also tried increasing the
> memory size to 4GB but I am still experiencing same issues. Drill is
> installed in Embedded Mode.
>
> Thanks,
> Yun
>
> -Original Message-
> From: Kunal Khatua [mailto:kkha...@mapr.com]
> Sent: Thursday, November 2, 2017 2:01 PM
> To: user@drill.apache.org
> Subject: RE: Drill Capacity
>
> Hi Yun
>
> Andries solution should address your problem. However, do understand
> that, unlike CSV files, a JSON file cannot be processed in parallel,
> because there is no clear record delimiter (CSV data usually has a new-line
> character to indicate the end of a record). So, the larger a file gets, the
> more work a single minor fragment has to do in processing it, including
> maintaining internal data-structures to represent the complex JSON document.
>
> The preferable way would be to create more JSON files so that the
> files can be processed in parallel.
>
> Hope that helps.
>
> ~ Kunal
>
> -Original Message-
> From: Andries Engelbrecht [mailto:aengelbre...@mapr.com]
> Sent: Thursday, November 02, 2017 10:26 AM
> To: user@drill.apache.org
> Subject: Re: Drill Capacity
>
> How much memory is allocated to the Drill environment?
> Embedded or in a cluster?
>
> I don’t think there is a particular limit, but a single JSON file will
> be read by a single minor fragment, in general it is better to match the
> number/size of files to the Drill environment.
>
> In the short term try to bump up planner.memory.max_query_memory_per_node
> in the options and see if that works for you.
>
> --Andries
>
>
>
> On 11/2/17, 7:46 AM, "Yun Liu"  wrote:
>
> Hi,
>
> I've been using Apache Drill actively and just wondering what is
> the capacity of Drill? I have a json file which is 390MB and it keeps
> throwing me an DATA_READ ERROR. I have another json file with exact same
> format but only 150MB and it's processing fine. When I did a *select* on
> the large json, it returns successfully for some of the fields. None of
> these errors really apply to me. So I am trying to understand the capacity
> of the json files Drill supports up to. Or if there's something else I
> missed.
>
> Thanks,
>
> Yun Liu
> Solutions Delivery Consultant
> 321 West 44th St | Suite 501 | New York, NY 10036
> +1 212.871.8355 office | +1 646.752.4933 mobile
>
> CAST, Leader in Software Analysis and Measurement
> Achieve Insight. Deliver Excellence.
> Join the discussion http://blog.castsoftware.com/
> LinkedIn | Twitter<
> http://twitter.com/onquality> | Facebook com/pages/CAST/105668942817177>
>
>
>
>
>
>


RE: about properly mapping marquet column

2017-11-02 Thread Lee, David
Replace single quotes with double quotes..

{"game_mode":"win_race", "result":"win", "race_time":"45", "air_time": "2"}

http://www.json.org/

select * from `/home/my_login/test.json`

game_mode   result  race_time   air_time
win_racewin 45  2


-Original Message-
From: Abel Castellanos [mailto:abel.castella...@gameloft.com] 
Sent: Thursday, November 02, 2017 4:43 AM
To: user@drill.apache.org
Subject: about properly mapping marquet column

I created a parquet file with several columns from spark 2.1.0

On of this column is a map type created based of a dictionary.

I was expecting that i could query this fields exactly in the same way I query 
json data from drill.

The problem the field is read by drill in this format

{"key_value":[{"key":"result","value":"win"},{"key":"game_mode","value":"win_race"},{"key":"air_time","value":"2"},{"key":"race_time","value":"45"}]}

when the original was

{'game_mode':'win_race', 'result':'win', 'race_time':'45', 'air_time': '2'}


Any idea to solve this problem?

-- 

*Abel Castellanos Carrazana*
Data Scientist/Engineer
Barcelona Data Team
Gameloft



This message may contain information that is confidential or privileged. If you 
are not the intended recipient, please advise the sender immediately and delete 
this message. See 
http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for 
further information.  Please refer to 
http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more 
information about BlackRock’s Privacy Policy.

For a list of BlackRock's office addresses worldwide, see 
http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.

© 2017 BlackRock, Inc. All rights reserved.


Re: Drill Capacity

2017-11-02 Thread Boaz Ben-Zvi
 Hi Yun,

 Can you try using the “managed” version of the external sort – either 
change this option to false:

0: jdbc:drill:zk=local> select * from sys.options where name like '%man%';
++--+---+--+--+--+-+---++
|name|   kind   | accessibleScopes  | optionScope  |  
status  | num_val  | string_val  | bool_val  | float_val  |
++--+---+--+--+--+-+---++
| exec.sort.disable_managed  | BOOLEAN  | ALL   | BOOT | 
DEFAULT  | null | null| false | null   |
++--+---+--+--+--+-+---++

Or override it into ‘false’ in the configuration:

0: jdbc:drill:zk=local> select * from sys.boot where name like '%managed%';
+---+--+---+--+-+--+-+---++
| name  |   kind   | accessibleScopes  
| optionScope  | status  | num_val  | string_val  | bool_val  | float_val  |
+---+--+---+--+-+--+-+---++
| drill.exec.options.exec.sort.disable_managed  | BOOLEAN  | BOOT  
| BOOT | BOOT| null | null| false | null   |
+---+--+---+--+-+--+-+---++

i.e., in the drill-override.conf file:

  sort: {
 external: {
 disable_managed: false
  }
  }

  Please let us know if this change helped,

 -- Boaz 


On 11/2/17, 1:12 PM, "Yun Liu"  wrote:

Please help me as to what further information I could provide to get this 
going. I am also experiencing a separate issue:

RESOURCE ERROR: One or more nodes ran out of memory while executing the 
query.

Unable to allocate sv2 for 8501 records, and not enough batchGroups to 
spill.
batchGroups.size 1
spilledBatchGroups.size 0
allocated memory 42768000
allocator limit 41943040

Current setting is: 
planner.memory.max_query_memory_per_node= 10GB 
HEAP to 12G 
Direct memory to 32G 
Perm to 1024M

What is the issue here?

Thanks,
Yun

-Original Message-
From: Yun Liu [mailto:y@castsoftware.com] 
Sent: Thursday, November 2, 2017 3:52 PM
To: user@drill.apache.org
Subject: RE: Drill Capacity

Yes- I increased planner.memory.max_query_memory_per_node to 10GB HEAP to 
12G Direct memory to 16G And Perm to 1024M

It didn't have any schema changes. As with the same file format but less 
data- it works perfectly ok. I am unable to tell if there's corruption.

Yun

-Original Message-
From: Andries Engelbrecht [mailto:aengelbre...@mapr.com]
Sent: Thursday, November 2, 2017 3:35 PM
To: user@drill.apache.org
Subject: Re: Drill Capacity

What memory setting did you increase? Have you tried 6 or 8GB?

How much memory is allocated to Drill Heap and Direct memory for the 
embedded Drillbit?

Also did you check the larger document doesn’t have any schema changes or 
corruption?

--Andries



On 11/2/17, 12:31 PM, "Yun Liu"  wrote:

Hi Kunal and Andries,

Thanks for your reply. We need json in this case because Drill only 
supports up to 65536 columns in a csv file. I also tried increasing the memory 
size to 4GB but I am still experiencing same issues. Drill is installed in 
Embedded Mode.

Thanks,
Yun

-Original Message-
From: Kunal Khatua [mailto:kkha...@mapr.com] 
Sent: Thursday, November 2, 2017 2:01 PM
To: user@drill.apache.org
Subject: RE: Drill Capacity

Hi Yun

Andries solution should address your problem. However, do understand 
that, unlike CSV files, a JSON file cannot be processed in parallel, because 
there is no clear record delimiter (CSV data usually has a new-line character 
to indicate the end of a record). So, the larger a file gets, the more work a 
single minor fragment has to do in processing it, including maintaining 
internal data-structures to represent the complex JSON document. 

The preferable way would be to create more JSON files so that the files 
can be processed in parallel. 

Hope that helps.

~ Kunal

-Original Message-
From: Andries 

about properly mapping marquet column

2017-11-02 Thread Abel Castellanos

I created a parquet file with several columns from spark 2.1.0

On of this column is a map type created based of a dictionary.

I was expecting that i could query this fields exactly in the same way I 
query json data from drill.


The problem the field is read by drill in this format

{"key_value":[{"key":"result","value":"win"},{"key":"game_mode","value":"win_race"},{"key":"air_time","value":"2"},{"key":"race_time","value":"45"}]}

when the original was

{'game_mode':'win_race', 'result':'win', 'race_time':'45', 'air_time': '2'}


Any idea to solve this problem?

--

*Abel Castellanos Carrazana*
Data Scientist/Engineer
Barcelona Data Team
Gameloft



RE: Drill Capacity

2017-11-02 Thread Yun Liu
Please help me as to what further information I could provide to get this 
going. I am also experiencing a separate issue:

RESOURCE ERROR: One or more nodes ran out of memory while executing the query.

Unable to allocate sv2 for 8501 records, and not enough batchGroups to spill.
batchGroups.size 1
spilledBatchGroups.size 0
allocated memory 42768000
allocator limit 41943040

Current setting is: 
planner.memory.max_query_memory_per_node= 10GB 
HEAP to 12G 
Direct memory to 32G 
Perm to 1024M

What is the issue here?

Thanks,
Yun

-Original Message-
From: Yun Liu [mailto:y@castsoftware.com] 
Sent: Thursday, November 2, 2017 3:52 PM
To: user@drill.apache.org
Subject: RE: Drill Capacity

Yes- I increased planner.memory.max_query_memory_per_node to 10GB HEAP to 12G 
Direct memory to 16G And Perm to 1024M

It didn't have any schema changes. As with the same file format but less data- 
it works perfectly ok. I am unable to tell if there's corruption.

Yun

-Original Message-
From: Andries Engelbrecht [mailto:aengelbre...@mapr.com]
Sent: Thursday, November 2, 2017 3:35 PM
To: user@drill.apache.org
Subject: Re: Drill Capacity

What memory setting did you increase? Have you tried 6 or 8GB?

How much memory is allocated to Drill Heap and Direct memory for the embedded 
Drillbit?

Also did you check the larger document doesn’t have any schema changes or 
corruption?

--Andries



On 11/2/17, 12:31 PM, "Yun Liu"  wrote:

Hi Kunal and Andries,

Thanks for your reply. We need json in this case because Drill only 
supports up to 65536 columns in a csv file. I also tried increasing the memory 
size to 4GB but I am still experiencing same issues. Drill is installed in 
Embedded Mode.

Thanks,
Yun

-Original Message-
From: Kunal Khatua [mailto:kkha...@mapr.com] 
Sent: Thursday, November 2, 2017 2:01 PM
To: user@drill.apache.org
Subject: RE: Drill Capacity

Hi Yun

Andries solution should address your problem. However, do understand that, 
unlike CSV files, a JSON file cannot be processed in parallel, because there is 
no clear record delimiter (CSV data usually has a new-line character to 
indicate the end of a record). So, the larger a file gets, the more work a 
single minor fragment has to do in processing it, including maintaining 
internal data-structures to represent the complex JSON document. 

The preferable way would be to create more JSON files so that the files can 
be processed in parallel. 

Hope that helps.

~ Kunal

-Original Message-
From: Andries Engelbrecht [mailto:aengelbre...@mapr.com] 
Sent: Thursday, November 02, 2017 10:26 AM
To: user@drill.apache.org
Subject: Re: Drill Capacity

How much memory is allocated to the Drill environment?
Embedded or in a cluster?

I don’t think there is a particular limit, but a single JSON file will be 
read by a single minor fragment, in general it is better to match the 
number/size of files to the Drill environment.

In the short term try to bump up planner.memory.max_query_memory_per_node 
in the options and see if that works for you.

--Andries



On 11/2/17, 7:46 AM, "Yun Liu"  wrote:

Hi,

I've been using Apache Drill actively and just wondering what is the 
capacity of Drill? I have a json file which is 390MB and it keeps throwing me 
an DATA_READ ERROR. I have another json file with exact same format but only 
150MB and it's processing fine. When I did a *select* on the large json, it 
returns successfully for some of the fields. None of these errors really apply 
to me. So I am trying to understand the capacity of the json files Drill 
supports up to. Or if there's something else I missed.

Thanks,

Yun Liu
Solutions Delivery Consultant
321 West 44th St | Suite 501 | New York, NY 10036
+1 212.871.8355 office | +1 646.752.4933 mobile

CAST, Leader in Software Analysis and Measurement
Achieve Insight. Deliver Excellence.
Join the discussion http://blog.castsoftware.com/
LinkedIn | 
Twitter | 
Facebook







RE: Drill Capacity

2017-11-02 Thread Yun Liu
Yes- I increased planner.memory.max_query_memory_per_node to 10GB
HEAP to 12G
Direct memory to 16G
And Perm to 1024M

It didn't have any schema changes. As with the same file format but less data- 
it works perfectly ok. I am unable to tell if there's corruption.

Yun

-Original Message-
From: Andries Engelbrecht [mailto:aengelbre...@mapr.com] 
Sent: Thursday, November 2, 2017 3:35 PM
To: user@drill.apache.org
Subject: Re: Drill Capacity

What memory setting did you increase? Have you tried 6 or 8GB?

How much memory is allocated to Drill Heap and Direct memory for the embedded 
Drillbit?

Also did you check the larger document doesn’t have any schema changes or 
corruption?

--Andries



On 11/2/17, 12:31 PM, "Yun Liu"  wrote:

Hi Kunal and Andries,

Thanks for your reply. We need json in this case because Drill only 
supports up to 65536 columns in a csv file. I also tried increasing the memory 
size to 4GB but I am still experiencing same issues. Drill is installed in 
Embedded Mode.

Thanks,
Yun

-Original Message-
From: Kunal Khatua [mailto:kkha...@mapr.com] 
Sent: Thursday, November 2, 2017 2:01 PM
To: user@drill.apache.org
Subject: RE: Drill Capacity

Hi Yun

Andries solution should address your problem. However, do understand that, 
unlike CSV files, a JSON file cannot be processed in parallel, because there is 
no clear record delimiter (CSV data usually has a new-line character to 
indicate the end of a record). So, the larger a file gets, the more work a 
single minor fragment has to do in processing it, including maintaining 
internal data-structures to represent the complex JSON document. 

The preferable way would be to create more JSON files so that the files can 
be processed in parallel. 

Hope that helps.

~ Kunal

-Original Message-
From: Andries Engelbrecht [mailto:aengelbre...@mapr.com] 
Sent: Thursday, November 02, 2017 10:26 AM
To: user@drill.apache.org
Subject: Re: Drill Capacity

How much memory is allocated to the Drill environment?
Embedded or in a cluster?

I don’t think there is a particular limit, but a single JSON file will be 
read by a single minor fragment, in general it is better to match the 
number/size of files to the Drill environment.

In the short term try to bump up planner.memory.max_query_memory_per_node 
in the options and see if that works for you.

--Andries



On 11/2/17, 7:46 AM, "Yun Liu"  wrote:

Hi,

I've been using Apache Drill actively and just wondering what is the 
capacity of Drill? I have a json file which is 390MB and it keeps throwing me 
an DATA_READ ERROR. I have another json file with exact same format but only 
150MB and it's processing fine. When I did a *select* on the large json, it 
returns successfully for some of the fields. None of these errors really apply 
to me. So I am trying to understand the capacity of the json files Drill 
supports up to. Or if there's something else I missed.

Thanks,

Yun Liu
Solutions Delivery Consultant
321 West 44th St | Suite 501 | New York, NY 10036
+1 212.871.8355 office | +1 646.752.4933 mobile

CAST, Leader in Software Analysis and Measurement
Achieve Insight. Deliver Excellence.
Join the discussion http://blog.castsoftware.com/
LinkedIn | 
Twitter | 
Facebook







RE: Drill Capacity

2017-11-02 Thread Kunal Khatua
Hi Yun

Andries solution should address your problem. However, do understand that, 
unlike CSV files, a JSON file cannot be processed in parallel, because there is 
no clear record delimiter (CSV data usually has a new-line character to 
indicate the end of a record). So, the larger a file gets, the more work a 
single minor fragment has to do in processing it, including maintaining 
internal data-structures to represent the complex JSON document. 

The preferable way would be to create more JSON files so that the files can be 
processed in parallel. 

Hope that helps.

~ Kunal

-Original Message-
From: Andries Engelbrecht [mailto:aengelbre...@mapr.com] 
Sent: Thursday, November 02, 2017 10:26 AM
To: user@drill.apache.org
Subject: Re: Drill Capacity

How much memory is allocated to the Drill environment?
Embedded or in a cluster?

I don’t think there is a particular limit, but a single JSON file will be read 
by a single minor fragment, in general it is better to match the number/size of 
files to the Drill environment.

In the short term try to bump up planner.memory.max_query_memory_per_node in 
the options and see if that works for you.

--Andries



On 11/2/17, 7:46 AM, "Yun Liu"  wrote:

Hi,

I've been using Apache Drill actively and just wondering what is the 
capacity of Drill? I have a json file which is 390MB and it keeps throwing me 
an DATA_READ ERROR. I have another json file with exact same format but only 
150MB and it's processing fine. When I did a *select* on the large json, it 
returns successfully for some of the fields. None of these errors really apply 
to me. So I am trying to understand the capacity of the json files Drill 
supports up to. Or if there's something else I missed.

Thanks,

Yun Liu
Solutions Delivery Consultant
321 West 44th St | Suite 501 | New York, NY 10036
+1 212.871.8355 office | +1 646.752.4933 mobile

CAST, Leader in Software Analysis and Measurement
Achieve Insight. Deliver Excellence.
Join the discussion http://blog.castsoftware.com/
LinkedIn | 
Twitter | 
Facebook





Re: Drill Capacity

2017-11-02 Thread Andries Engelbrecht
How much memory is allocated to the Drill environment?
Embedded or in a cluster?

I don’t think there is a particular limit, but a single JSON file will be read 
by a single minor fragment, in general it is better to match the number/size of 
files to the Drill environment.

In the short term try to bump up planner.memory.max_query_memory_per_node in 
the options and see if that works for you.

--Andries



On 11/2/17, 7:46 AM, "Yun Liu"  wrote:

Hi,

I've been using Apache Drill actively and just wondering what is the 
capacity of Drill? I have a json file which is 390MB and it keeps throwing me 
an DATA_READ ERROR. I have another json file with exact same format but only 
150MB and it's processing fine. When I did a *select* on the large json, it 
returns successfully for some of the fields. None of these errors really apply 
to me. So I am trying to understand the capacity of the json files Drill 
supports up to. Or if there's something else I missed.

Thanks,

Yun Liu
Solutions Delivery Consultant
321 West 44th St | Suite 501 | New York, NY 10036
+1 212.871.8355 office | +1 646.752.4933 mobile

CAST, Leader in Software Analysis and Measurement
Achieve Insight. Deliver Excellence.
Join the discussion http://blog.castsoftware.com/
LinkedIn | 
Twitter | 
Facebook





Re: Drill Capacity

2017-11-02 Thread Prasad Nagaraj Subramanya
Hi Yun,

Drill is designed to query large datasets. There is no specific limit on
the size, it works well even when data is in hundreds of GBs.

DATA_READ ERROR has something to do with the data in your file. The data in
some of the columns may not be consistent with the datatype.
Please refer to this link for one such example -
https://stackoverflow.com/questions/40217328/apache-drill-mysql-and-data-read-error-failure-while-attempting-to-read-from


Thanks,
Prasad

On Thu, Nov 2, 2017 at 7:46 AM, Yun Liu  wrote:

> Hi,
>
> I've been using Apache Drill actively and just wondering what is the
> capacity of Drill? I have a json file which is 390MB and it keeps throwing
> me an DATA_READ ERROR. I have another json file with exact same format but
> only 150MB and it's processing fine. When I did a *select* on the large
> json, it returns successfully for some of the fields. None of these errors
> really apply to me. So I am trying to understand the capacity of the json
> files Drill supports up to. Or if there's something else I missed.
>
> Thanks,
>
> Yun Liu
> Solutions Delivery Consultant
> 321 West 44th St | Suite 501 | New York, NY 10036
> +1 212.871.8355 office | +1 646.752.4933 mobile
>
> CAST, Leader in Software Analysis and Measurement
> Achieve Insight. Deliver Excellence.
> Join the discussion http://blog.castsoftware.com/
> LinkedIn | Twitter<
> http://twitter.com/onquality> | Facebook com/pages/CAST/105668942817177>
>
>


Drill Capacity

2017-11-02 Thread Yun Liu
Hi,

I've been using Apache Drill actively and just wondering what is the capacity 
of Drill? I have a json file which is 390MB and it keeps throwing me an 
DATA_READ ERROR. I have another json file with exact same format but only 150MB 
and it's processing fine. When I did a *select* on the large json, it returns 
successfully for some of the fields. None of these errors really apply to me. 
So I am trying to understand the capacity of the json files Drill supports up 
to. Or if there's something else I missed.

Thanks,

Yun Liu
Solutions Delivery Consultant
321 West 44th St | Suite 501 | New York, NY 10036
+1 212.871.8355 office | +1 646.752.4933 mobile

CAST, Leader in Software Analysis and Measurement
Achieve Insight. Deliver Excellence.
Join the discussion http://blog.castsoftware.com/
LinkedIn | 
Twitter | 
Facebook