Re: How do I make json files les painful

2015-03-19 Thread Jim Bates
Yes, it was a memory thing. I was running on a sand box and first the query
is killed, then 20 or 30 sec latter the kernel is still out of memory,
can't seam to kill anything, and then it stops the cpu. I'll send you the
query and compressed data directly.

On Thu, Mar 19, 2015 at 6:16 PM, Jacques Nadeau  wrote:

> Kernel panic?  Can you try to share the information that causes this?  Are
> you running out of memory?  What type of system are you running on?
>
> On Thu, Mar 19, 2015 at 2:00 PM, Jim Bates  wrote:
>
> > On first look I could read all the files but  doing a flatten caused all
> > kinds of things that were bad. The worst was a repeatable kernel panic.
> >
> > I think I'm back to making the initial smaller in the larger file sets.
> >
> > I have some files that are say 100M in size. Each file is a single line
> > array:
> > {"MyArrayInTheFile":[{"a":"1","b":"2"},{"a":"1","b":"2"},...]}
> >  What is the best way to represent that so it can be explored? Do I do
> what
> > was suggested before and put each array entry on its own line?
> > {"MyArrayInTheFile":[
> > {"a":"1","b":"2"},
> > {"a":"1","b":"2"},
> > ...
> > ]}
> >
> > What works best for the 0.8 code?
> >
> >
> > On Thu, Mar 19, 2015 at 12:59 PM, Jim Bates  wrote:
> >
> > > Ok, went to drill-0.8.0.31020-1 and it was %1000 better.
> > >
> > > On Thu, Mar 19, 2015 at 12:16 PM, Sudhakar Thota 
> > > wrote:
> > >
> > >> I got the same issue, engineering recommended me use drill-0.8.0
> > >>
> > >> Sudhakar Thota
> > >> Sent from my iPhone
> > >>
> > >> > On Mar 19, 2015, at 9:22 AM, Jim Bates  wrote:
> > >> >
> > >> > I constantly, constantly, constantly hit this.
> > >> >
> > >> > I have json files that are just a huge collection of an array of
> json
> > >> > objects
> > >> >
> > >> > example
> > >> > "MyArrayInTheFile":
> > >> > [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...]
> > >> >
> > >> > My issue is in exploring the data, I hit this.
> > >> >
> > >> > Query failed: Query stopped., Record was too large to copy into
> > vector.
> > >> [
> > >> > 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ]
> > >> >
> > >> > I can explore csv, tab, maprdb, hive at fairly large data sets and
> > limit
> > >> > the response to what fits in my system limitations but not json in
> > this
> > >> > format.
> > >> >
> > >> > The two options I have come up with to move forward are..
> > >> >
> > >> >   1. I strip out 90% of the array values in a file and explore that
> to
> > >> get
> > >> >   to my view. then go to a larger system and see if I have enough to
> > >> get the
> > >> >   job done.
> > >> >   2. Move to the larger system and explore there taking resources
> that
> > >> >   don't need to be spent on a science project.
> > >> >
> > >> > Hoping the smart people have a different option for me,
> > >> >
> > >> > Jim
> > >>
> > >
> > >
> >
>


Re: How do I make json files les painful

2015-03-19 Thread Jacques Nadeau
Kernel panic?  Can you try to share the information that causes this?  Are
you running out of memory?  What type of system are you running on?

On Thu, Mar 19, 2015 at 2:00 PM, Jim Bates  wrote:

> On first look I could read all the files but  doing a flatten caused all
> kinds of things that were bad. The worst was a repeatable kernel panic.
>
> I think I'm back to making the initial smaller in the larger file sets.
>
> I have some files that are say 100M in size. Each file is a single line
> array:
> {"MyArrayInTheFile":[{"a":"1","b":"2"},{"a":"1","b":"2"},...]}
>  What is the best way to represent that so it can be explored? Do I do what
> was suggested before and put each array entry on its own line?
> {"MyArrayInTheFile":[
> {"a":"1","b":"2"},
> {"a":"1","b":"2"},
> ...
> ]}
>
> What works best for the 0.8 code?
>
>
> On Thu, Mar 19, 2015 at 12:59 PM, Jim Bates  wrote:
>
> > Ok, went to drill-0.8.0.31020-1 and it was %1000 better.
> >
> > On Thu, Mar 19, 2015 at 12:16 PM, Sudhakar Thota 
> > wrote:
> >
> >> I got the same issue, engineering recommended me use drill-0.8.0
> >>
> >> Sudhakar Thota
> >> Sent from my iPhone
> >>
> >> > On Mar 19, 2015, at 9:22 AM, Jim Bates  wrote:
> >> >
> >> > I constantly, constantly, constantly hit this.
> >> >
> >> > I have json files that are just a huge collection of an array of json
> >> > objects
> >> >
> >> > example
> >> > "MyArrayInTheFile":
> >> > [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...]
> >> >
> >> > My issue is in exploring the data, I hit this.
> >> >
> >> > Query failed: Query stopped., Record was too large to copy into
> vector.
> >> [
> >> > 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ]
> >> >
> >> > I can explore csv, tab, maprdb, hive at fairly large data sets and
> limit
> >> > the response to what fits in my system limitations but not json in
> this
> >> > format.
> >> >
> >> > The two options I have come up with to move forward are..
> >> >
> >> >   1. I strip out 90% of the array values in a file and explore that to
> >> get
> >> >   to my view. then go to a larger system and see if I have enough to
> >> get the
> >> >   job done.
> >> >   2. Move to the larger system and explore there taking resources that
> >> >   don't need to be spent on a science project.
> >> >
> >> > Hoping the smart people have a different option for me,
> >> >
> >> > Jim
> >>
> >
> >
>


Re: How do I make json files les painful

2015-03-19 Thread Jim Bates
Ok, large json arrays kept crashing the vm OS so I just flattened the
larger files with jq

example in keeping with what I already mentioned:

jq '.[] | {MyArrayInTheFile: .[]}' MyFile.json > MyNewfile.json

result:
{ "MyArrayInTheFile": {"a":"1","b":"2"} }
{ "MyArrayInTheFile": {"a":"1","b":"2"} }



On Thu, Mar 19, 2015 at 4:00 PM, Jim Bates  wrote:

> On first look I could read all the files but  doing a flatten caused all
> kinds of things that were bad. The worst was a repeatable kernel panic.
>
> I think I'm back to making the initial smaller in the larger file sets.
>
> I have some files that are say 100M in size. Each file is a single line
> array:
> {"MyArrayInTheFile":[{"a":"1","b":"2"},{"a":"1","b":"2"},...]}
>  What is the best way to represent that so it can be explored? Do I do
> what was suggested before and put each array entry on its own line?
> {"MyArrayInTheFile":[
> {"a":"1","b":"2"},
> {"a":"1","b":"2"},
> ...
> ]}
>
> What works best for the 0.8 code?
>
>
> On Thu, Mar 19, 2015 at 12:59 PM, Jim Bates  wrote:
>
>> Ok, went to drill-0.8.0.31020-1 and it was %1000 better.
>>
>> On Thu, Mar 19, 2015 at 12:16 PM, Sudhakar Thota 
>> wrote:
>>
>>> I got the same issue, engineering recommended me use drill-0.8.0
>>>
>>> Sudhakar Thota
>>> Sent from my iPhone
>>>
>>> > On Mar 19, 2015, at 9:22 AM, Jim Bates  wrote:
>>> >
>>> > I constantly, constantly, constantly hit this.
>>> >
>>> > I have json files that are just a huge collection of an array of json
>>> > objects
>>> >
>>> > example
>>> > "MyArrayInTheFile":
>>> > [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...]
>>> >
>>> > My issue is in exploring the data, I hit this.
>>> >
>>> > Query failed: Query stopped., Record was too large to copy into
>>> vector. [
>>> > 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ]
>>> >
>>> > I can explore csv, tab, maprdb, hive at fairly large data sets and
>>> limit
>>> > the response to what fits in my system limitations but not json in this
>>> > format.
>>> >
>>> > The two options I have come up with to move forward are..
>>> >
>>> >   1. I strip out 90% of the array values in a file and explore that to
>>> get
>>> >   to my view. then go to a larger system and see if I have enough to
>>> get the
>>> >   job done.
>>> >   2. Move to the larger system and explore there taking resources that
>>> >   don't need to be spent on a science project.
>>> >
>>> > Hoping the smart people have a different option for me,
>>> >
>>> > Jim
>>>
>>
>>
>


Re: How do I make json files les painful

2015-03-19 Thread Jim Bates
On first look I could read all the files but  doing a flatten caused all
kinds of things that were bad. The worst was a repeatable kernel panic.

I think I'm back to making the initial smaller in the larger file sets.

I have some files that are say 100M in size. Each file is a single line
array:
{"MyArrayInTheFile":[{"a":"1","b":"2"},{"a":"1","b":"2"},...]}
 What is the best way to represent that so it can be explored? Do I do what
was suggested before and put each array entry on its own line?
{"MyArrayInTheFile":[
{"a":"1","b":"2"},
{"a":"1","b":"2"},
...
]}

What works best for the 0.8 code?


On Thu, Mar 19, 2015 at 12:59 PM, Jim Bates  wrote:

> Ok, went to drill-0.8.0.31020-1 and it was %1000 better.
>
> On Thu, Mar 19, 2015 at 12:16 PM, Sudhakar Thota 
> wrote:
>
>> I got the same issue, engineering recommended me use drill-0.8.0
>>
>> Sudhakar Thota
>> Sent from my iPhone
>>
>> > On Mar 19, 2015, at 9:22 AM, Jim Bates  wrote:
>> >
>> > I constantly, constantly, constantly hit this.
>> >
>> > I have json files that are just a huge collection of an array of json
>> > objects
>> >
>> > example
>> > "MyArrayInTheFile":
>> > [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...]
>> >
>> > My issue is in exploring the data, I hit this.
>> >
>> > Query failed: Query stopped., Record was too large to copy into vector.
>> [
>> > 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ]
>> >
>> > I can explore csv, tab, maprdb, hive at fairly large data sets and limit
>> > the response to what fits in my system limitations but not json in this
>> > format.
>> >
>> > The two options I have come up with to move forward are..
>> >
>> >   1. I strip out 90% of the array values in a file and explore that to
>> get
>> >   to my view. then go to a larger system and see if I have enough to
>> get the
>> >   job done.
>> >   2. Move to the larger system and explore there taking resources that
>> >   don't need to be spent on a science project.
>> >
>> > Hoping the smart people have a different option for me,
>> >
>> > Jim
>>
>
>


Re: How do I make json files les painful

2015-03-19 Thread Jim Bates
Ok, went to drill-0.8.0.31020-1 and it was %1000 better.

On Thu, Mar 19, 2015 at 12:16 PM, Sudhakar Thota 
wrote:

> I got the same issue, engineering recommended me use drill-0.8.0
>
> Sudhakar Thota
> Sent from my iPhone
>
> > On Mar 19, 2015, at 9:22 AM, Jim Bates  wrote:
> >
> > I constantly, constantly, constantly hit this.
> >
> > I have json files that are just a huge collection of an array of json
> > objects
> >
> > example
> > "MyArrayInTheFile":
> > [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...]
> >
> > My issue is in exploring the data, I hit this.
> >
> > Query failed: Query stopped., Record was too large to copy into vector. [
> > 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ]
> >
> > I can explore csv, tab, maprdb, hive at fairly large data sets and limit
> > the response to what fits in my system limitations but not json in this
> > format.
> >
> > The two options I have come up with to move forward are..
> >
> >   1. I strip out 90% of the array values in a file and explore that to
> get
> >   to my view. then go to a larger system and see if I have enough to get
> the
> >   job done.
> >   2. Move to the larger system and explore there taking resources that
> >   don't need to be spent on a science project.
> >
> > Hoping the smart people have a different option for me,
> >
> > Jim
>


Re: How do I make json files les painful

2015-03-19 Thread Sudhakar Thota
I got the same issue, engineering recommended me use drill-0.8.0

Sudhakar Thota
Sent from my iPhone

> On Mar 19, 2015, at 9:22 AM, Jim Bates  wrote:
> 
> I constantly, constantly, constantly hit this.
> 
> I have json files that are just a huge collection of an array of json
> objects
> 
> example
> "MyArrayInTheFile":
> [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...]
> 
> My issue is in exploring the data, I hit this.
> 
> Query failed: Query stopped., Record was too large to copy into vector. [
> 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ]
> 
> I can explore csv, tab, maprdb, hive at fairly large data sets and limit
> the response to what fits in my system limitations but not json in this
> format.
> 
> The two options I have come up with to move forward are..
> 
>   1. I strip out 90% of the array values in a file and explore that to get
>   to my view. then go to a larger system and see if I have enough to get the
>   job done.
>   2. Move to the larger system and explore there taking resources that
>   don't need to be spent on a science project.
> 
> Hoping the smart people have a different option for me,
> 
> Jim


Re: How do I make json files les painful

2015-03-19 Thread Kristine Hahn
I've heard and experienced that the array at the root level can cause a
problem. Remove square brackets at the root of the object

Kristine Hahn
Sr. Technical Writer
415-497-8107 @krishahn


On Thu, Mar 19, 2015 at 9:49 AM, Andries Engelbrecht <
aengelbre...@maprtech.com> wrote:

> See if you can use a prerelease 0.8 (if possible for your environment).
>
> On Mar 19, 2015, at 9:39 AM, Jim Bates  wrote:
>
> > mapr-drill-0.7.0.29774-1
> >
> > On Thu, Mar 19, 2015 at 11:36 AM, Andries Engelbrecht <
> > aengelbre...@maprtech.com> wrote:
> >
> >> Which Drill version are you using?
> >>
> >> —Andries
> >>
> >>
> >> On Mar 19, 2015, at 9:22 AM, Jim Bates  wrote:
> >>
> >>> I constantly, constantly, constantly hit this.
> >>>
> >>> I have json files that are just a huge collection of an array of json
> >>> objects
> >>>
> >>> example
> >>> "MyArrayInTheFile":
> >>> [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...]
> >>>
> >>> My issue is in exploring the data, I hit this.
> >>>
> >>> Query failed: Query stopped., Record was too large to copy into
> vector. [
> >>> 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ]
> >>>
> >>> I can explore csv, tab, maprdb, hive at fairly large data sets and
> limit
> >>> the response to what fits in my system limitations but not json in this
> >>> format.
> >>>
> >>> The two options I have come up with to move forward are..
> >>>
> >>>  1. I strip out 90% of the array values in a file and explore that to
> >> get
> >>>  to my view. then go to a larger system and see if I have enough to get
> >> the
> >>>  job done.
> >>>  2. Move to the larger system and explore there taking resources that
> >>>  don't need to be spent on a science project.
> >>>
> >>> Hoping the smart people have a different option for me,
> >>>
> >>> Jim
> >>
> >>
>
>


Re: How do I make json files les painful

2015-03-19 Thread Andries Engelbrecht
See if you can use a prerelease 0.8 (if possible for your environment).

On Mar 19, 2015, at 9:39 AM, Jim Bates  wrote:

> mapr-drill-0.7.0.29774-1
> 
> On Thu, Mar 19, 2015 at 11:36 AM, Andries Engelbrecht <
> aengelbre...@maprtech.com> wrote:
> 
>> Which Drill version are you using?
>> 
>> —Andries
>> 
>> 
>> On Mar 19, 2015, at 9:22 AM, Jim Bates  wrote:
>> 
>>> I constantly, constantly, constantly hit this.
>>> 
>>> I have json files that are just a huge collection of an array of json
>>> objects
>>> 
>>> example
>>> "MyArrayInTheFile":
>>> [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...]
>>> 
>>> My issue is in exploring the data, I hit this.
>>> 
>>> Query failed: Query stopped., Record was too large to copy into vector. [
>>> 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ]
>>> 
>>> I can explore csv, tab, maprdb, hive at fairly large data sets and limit
>>> the response to what fits in my system limitations but not json in this
>>> format.
>>> 
>>> The two options I have come up with to move forward are..
>>> 
>>>  1. I strip out 90% of the array values in a file and explore that to
>> get
>>>  to my view. then go to a larger system and see if I have enough to get
>> the
>>>  job done.
>>>  2. Move to the larger system and explore there taking resources that
>>>  don't need to be spent on a science project.
>>> 
>>> Hoping the smart people have a different option for me,
>>> 
>>> Jim
>> 
>> 



Re: How do I make json files les painful

2015-03-19 Thread Jim Bates
I'll give that a shot but it is troubling because actual data size is not
the direct issue.

One file that is 43M works fine but one that is 5M is to larger. I'll have
to dig into some logging when time is more plentiful.

On Thu, Mar 19, 2015 at 11:28 AM, Matt  wrote:

> Is each file a single json array object?
>
> If so, would converting the files to a format with one line per record a
> potential solution?
>
> Example using jq (http://stedolan.github.io/jq/): jq -c '.[]'
>
>
> On 19 Mar 2015, at 12:22, Jim Bates wrote:
>
>  I constantly, constantly, constantly hit this.
>>
>> I have json files that are just a huge collection of an array of json
>> objects
>>
>> example
>> "MyArrayInTheFile":
>> [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...]
>>
>> My issue is in exploring the data, I hit this.
>>
>> Query failed: Query stopped., Record was too large to copy into vector. [
>> 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ]
>>
>> I can explore csv, tab, maprdb, hive at fairly large data sets and limit
>> the response to what fits in my system limitations but not json in this
>> format.
>>
>> The two options I have come up with to move forward are..
>>
>> 1. I strip out 90% of the array values in a file and explore that to get
>> to my view. then go to a larger system and see if I have enough to get the
>> job done.
>> 2. Move to the larger system and explore there taking resources that
>> don't need to be spent on a science project.
>>
>> Hoping the smart people have a different option for me,
>>
>> Jim
>>
>


Re: How do I make json files les painful

2015-03-19 Thread Jim Bates
mapr-drill-0.7.0.29774-1

On Thu, Mar 19, 2015 at 11:36 AM, Andries Engelbrecht <
aengelbre...@maprtech.com> wrote:

> Which Drill version are you using?
>
> —Andries
>
>
> On Mar 19, 2015, at 9:22 AM, Jim Bates  wrote:
>
> > I constantly, constantly, constantly hit this.
> >
> > I have json files that are just a huge collection of an array of json
> > objects
> >
> > example
> > "MyArrayInTheFile":
> > [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...]
> >
> > My issue is in exploring the data, I hit this.
> >
> > Query failed: Query stopped., Record was too large to copy into vector. [
> > 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ]
> >
> > I can explore csv, tab, maprdb, hive at fairly large data sets and limit
> > the response to what fits in my system limitations but not json in this
> > format.
> >
> > The two options I have come up with to move forward are..
> >
> >   1. I strip out 90% of the array values in a file and explore that to
> get
> >   to my view. then go to a larger system and see if I have enough to get
> the
> >   job done.
> >   2. Move to the larger system and explore there taking resources that
> >   don't need to be spent on a science project.
> >
> > Hoping the smart people have a different option for me,
> >
> > Jim
>
>


Re: How do I make json files les painful

2015-03-19 Thread Andries Engelbrecht
Which Drill version are you using?

—Andries


On Mar 19, 2015, at 9:22 AM, Jim Bates  wrote:

> I constantly, constantly, constantly hit this.
> 
> I have json files that are just a huge collection of an array of json
> objects
> 
> example
> "MyArrayInTheFile":
> [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...]
> 
> My issue is in exploring the data, I hit this.
> 
> Query failed: Query stopped., Record was too large to copy into vector. [
> 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ]
> 
> I can explore csv, tab, maprdb, hive at fairly large data sets and limit
> the response to what fits in my system limitations but not json in this
> format.
> 
> The two options I have come up with to move forward are..
> 
>   1. I strip out 90% of the array values in a file and explore that to get
>   to my view. then go to a larger system and see if I have enough to get the
>   job done.
>   2. Move to the larger system and explore there taking resources that
>   don't need to be spent on a science project.
> 
> Hoping the smart people have a different option for me,
> 
> Jim



Re: How do I make json files les painful

2015-03-19 Thread Matt

Is each file a single json array object?

If so, would converting the files to a format with one line per record a 
potential solution?


Example using jq (http://stedolan.github.io/jq/): jq -c '.[]'


On 19 Mar 2015, at 12:22, Jim Bates wrote:


I constantly, constantly, constantly hit this.

I have json files that are just a huge collection of an array of json
objects

example
"MyArrayInTheFile":
[{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...]

My issue is in exploring the data, I hit this.

Query failed: Query stopped., Record was too large to copy into 
vector. [

39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ]

I can explore csv, tab, maprdb, hive at fairly large data sets and 
limit
the response to what fits in my system limitations but not json in 
this

format.

The two options I have come up with to move forward are..

1. I strip out 90% of the array values in a file and explore that to 
get
to my view. then go to a larger system and see if I have enough to get 
the

job done.
2. Move to the larger system and explore there taking resources that
don't need to be spent on a science project.

Hoping the smart people have a different option for me,

Jim


How do I make json files les painful

2015-03-19 Thread Jim Bates
I constantly, constantly, constantly hit this.

I have json files that are just a huge collection of an array of json
objects

example
"MyArrayInTheFile":
[{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...]

My issue is in exploring the data, I hit this.

Query failed: Query stopped., Record was too large to copy into vector. [
39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ]

I can explore csv, tab, maprdb, hive at fairly large data sets and limit
the response to what fits in my system limitations but not json in this
format.

The two options I have come up with to move forward are..

   1. I strip out 90% of the array values in a file and explore that to get
   to my view. then go to a larger system and see if I have enough to get the
   job done.
   2. Move to the larger system and explore there taking resources that
   don't need to be spent on a science project.

Hoping the smart people have a different option for me,

Jim