Re: How do I make json files les painful
Yes, it was a memory thing. I was running on a sand box and first the query is killed, then 20 or 30 sec latter the kernel is still out of memory, can't seam to kill anything, and then it stops the cpu. I'll send you the query and compressed data directly. On Thu, Mar 19, 2015 at 6:16 PM, Jacques Nadeau wrote: > Kernel panic? Can you try to share the information that causes this? Are > you running out of memory? What type of system are you running on? > > On Thu, Mar 19, 2015 at 2:00 PM, Jim Bates wrote: > > > On first look I could read all the files but doing a flatten caused all > > kinds of things that were bad. The worst was a repeatable kernel panic. > > > > I think I'm back to making the initial smaller in the larger file sets. > > > > I have some files that are say 100M in size. Each file is a single line > > array: > > {"MyArrayInTheFile":[{"a":"1","b":"2"},{"a":"1","b":"2"},...]} > > What is the best way to represent that so it can be explored? Do I do > what > > was suggested before and put each array entry on its own line? > > {"MyArrayInTheFile":[ > > {"a":"1","b":"2"}, > > {"a":"1","b":"2"}, > > ... > > ]} > > > > What works best for the 0.8 code? > > > > > > On Thu, Mar 19, 2015 at 12:59 PM, Jim Bates wrote: > > > > > Ok, went to drill-0.8.0.31020-1 and it was %1000 better. > > > > > > On Thu, Mar 19, 2015 at 12:16 PM, Sudhakar Thota > > > wrote: > > > > > >> I got the same issue, engineering recommended me use drill-0.8.0 > > >> > > >> Sudhakar Thota > > >> Sent from my iPhone > > >> > > >> > On Mar 19, 2015, at 9:22 AM, Jim Bates wrote: > > >> > > > >> > I constantly, constantly, constantly hit this. > > >> > > > >> > I have json files that are just a huge collection of an array of > json > > >> > objects > > >> > > > >> > example > > >> > "MyArrayInTheFile": > > >> > [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...] > > >> > > > >> > My issue is in exploring the data, I hit this. > > >> > > > >> > Query failed: Query stopped., Record was too large to copy into > > vector. > > >> [ > > >> > 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ] > > >> > > > >> > I can explore csv, tab, maprdb, hive at fairly large data sets and > > limit > > >> > the response to what fits in my system limitations but not json in > > this > > >> > format. > > >> > > > >> > The two options I have come up with to move forward are.. > > >> > > > >> > 1. I strip out 90% of the array values in a file and explore that > to > > >> get > > >> > to my view. then go to a larger system and see if I have enough to > > >> get the > > >> > job done. > > >> > 2. Move to the larger system and explore there taking resources > that > > >> > don't need to be spent on a science project. > > >> > > > >> > Hoping the smart people have a different option for me, > > >> > > > >> > Jim > > >> > > > > > > > > >
Re: How do I make json files les painful
Kernel panic? Can you try to share the information that causes this? Are you running out of memory? What type of system are you running on? On Thu, Mar 19, 2015 at 2:00 PM, Jim Bates wrote: > On first look I could read all the files but doing a flatten caused all > kinds of things that were bad. The worst was a repeatable kernel panic. > > I think I'm back to making the initial smaller in the larger file sets. > > I have some files that are say 100M in size. Each file is a single line > array: > {"MyArrayInTheFile":[{"a":"1","b":"2"},{"a":"1","b":"2"},...]} > What is the best way to represent that so it can be explored? Do I do what > was suggested before and put each array entry on its own line? > {"MyArrayInTheFile":[ > {"a":"1","b":"2"}, > {"a":"1","b":"2"}, > ... > ]} > > What works best for the 0.8 code? > > > On Thu, Mar 19, 2015 at 12:59 PM, Jim Bates wrote: > > > Ok, went to drill-0.8.0.31020-1 and it was %1000 better. > > > > On Thu, Mar 19, 2015 at 12:16 PM, Sudhakar Thota > > wrote: > > > >> I got the same issue, engineering recommended me use drill-0.8.0 > >> > >> Sudhakar Thota > >> Sent from my iPhone > >> > >> > On Mar 19, 2015, at 9:22 AM, Jim Bates wrote: > >> > > >> > I constantly, constantly, constantly hit this. > >> > > >> > I have json files that are just a huge collection of an array of json > >> > objects > >> > > >> > example > >> > "MyArrayInTheFile": > >> > [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...] > >> > > >> > My issue is in exploring the data, I hit this. > >> > > >> > Query failed: Query stopped., Record was too large to copy into > vector. > >> [ > >> > 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ] > >> > > >> > I can explore csv, tab, maprdb, hive at fairly large data sets and > limit > >> > the response to what fits in my system limitations but not json in > this > >> > format. > >> > > >> > The two options I have come up with to move forward are.. > >> > > >> > 1. I strip out 90% of the array values in a file and explore that to > >> get > >> > to my view. then go to a larger system and see if I have enough to > >> get the > >> > job done. > >> > 2. Move to the larger system and explore there taking resources that > >> > don't need to be spent on a science project. > >> > > >> > Hoping the smart people have a different option for me, > >> > > >> > Jim > >> > > > > >
Re: How do I make json files les painful
Ok, large json arrays kept crashing the vm OS so I just flattened the larger files with jq example in keeping with what I already mentioned: jq '.[] | {MyArrayInTheFile: .[]}' MyFile.json > MyNewfile.json result: { "MyArrayInTheFile": {"a":"1","b":"2"} } { "MyArrayInTheFile": {"a":"1","b":"2"} } On Thu, Mar 19, 2015 at 4:00 PM, Jim Bates wrote: > On first look I could read all the files but doing a flatten caused all > kinds of things that were bad. The worst was a repeatable kernel panic. > > I think I'm back to making the initial smaller in the larger file sets. > > I have some files that are say 100M in size. Each file is a single line > array: > {"MyArrayInTheFile":[{"a":"1","b":"2"},{"a":"1","b":"2"},...]} > What is the best way to represent that so it can be explored? Do I do > what was suggested before and put each array entry on its own line? > {"MyArrayInTheFile":[ > {"a":"1","b":"2"}, > {"a":"1","b":"2"}, > ... > ]} > > What works best for the 0.8 code? > > > On Thu, Mar 19, 2015 at 12:59 PM, Jim Bates wrote: > >> Ok, went to drill-0.8.0.31020-1 and it was %1000 better. >> >> On Thu, Mar 19, 2015 at 12:16 PM, Sudhakar Thota >> wrote: >> >>> I got the same issue, engineering recommended me use drill-0.8.0 >>> >>> Sudhakar Thota >>> Sent from my iPhone >>> >>> > On Mar 19, 2015, at 9:22 AM, Jim Bates wrote: >>> > >>> > I constantly, constantly, constantly hit this. >>> > >>> > I have json files that are just a huge collection of an array of json >>> > objects >>> > >>> > example >>> > "MyArrayInTheFile": >>> > [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...] >>> > >>> > My issue is in exploring the data, I hit this. >>> > >>> > Query failed: Query stopped., Record was too large to copy into >>> vector. [ >>> > 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ] >>> > >>> > I can explore csv, tab, maprdb, hive at fairly large data sets and >>> limit >>> > the response to what fits in my system limitations but not json in this >>> > format. >>> > >>> > The two options I have come up with to move forward are.. >>> > >>> > 1. I strip out 90% of the array values in a file and explore that to >>> get >>> > to my view. then go to a larger system and see if I have enough to >>> get the >>> > job done. >>> > 2. Move to the larger system and explore there taking resources that >>> > don't need to be spent on a science project. >>> > >>> > Hoping the smart people have a different option for me, >>> > >>> > Jim >>> >> >> >
Re: How do I make json files les painful
On first look I could read all the files but doing a flatten caused all kinds of things that were bad. The worst was a repeatable kernel panic. I think I'm back to making the initial smaller in the larger file sets. I have some files that are say 100M in size. Each file is a single line array: {"MyArrayInTheFile":[{"a":"1","b":"2"},{"a":"1","b":"2"},...]} What is the best way to represent that so it can be explored? Do I do what was suggested before and put each array entry on its own line? {"MyArrayInTheFile":[ {"a":"1","b":"2"}, {"a":"1","b":"2"}, ... ]} What works best for the 0.8 code? On Thu, Mar 19, 2015 at 12:59 PM, Jim Bates wrote: > Ok, went to drill-0.8.0.31020-1 and it was %1000 better. > > On Thu, Mar 19, 2015 at 12:16 PM, Sudhakar Thota > wrote: > >> I got the same issue, engineering recommended me use drill-0.8.0 >> >> Sudhakar Thota >> Sent from my iPhone >> >> > On Mar 19, 2015, at 9:22 AM, Jim Bates wrote: >> > >> > I constantly, constantly, constantly hit this. >> > >> > I have json files that are just a huge collection of an array of json >> > objects >> > >> > example >> > "MyArrayInTheFile": >> > [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...] >> > >> > My issue is in exploring the data, I hit this. >> > >> > Query failed: Query stopped., Record was too large to copy into vector. >> [ >> > 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ] >> > >> > I can explore csv, tab, maprdb, hive at fairly large data sets and limit >> > the response to what fits in my system limitations but not json in this >> > format. >> > >> > The two options I have come up with to move forward are.. >> > >> > 1. I strip out 90% of the array values in a file and explore that to >> get >> > to my view. then go to a larger system and see if I have enough to >> get the >> > job done. >> > 2. Move to the larger system and explore there taking resources that >> > don't need to be spent on a science project. >> > >> > Hoping the smart people have a different option for me, >> > >> > Jim >> > >
Re: How do I make json files les painful
Ok, went to drill-0.8.0.31020-1 and it was %1000 better. On Thu, Mar 19, 2015 at 12:16 PM, Sudhakar Thota wrote: > I got the same issue, engineering recommended me use drill-0.8.0 > > Sudhakar Thota > Sent from my iPhone > > > On Mar 19, 2015, at 9:22 AM, Jim Bates wrote: > > > > I constantly, constantly, constantly hit this. > > > > I have json files that are just a huge collection of an array of json > > objects > > > > example > > "MyArrayInTheFile": > > [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...] > > > > My issue is in exploring the data, I hit this. > > > > Query failed: Query stopped., Record was too large to copy into vector. [ > > 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ] > > > > I can explore csv, tab, maprdb, hive at fairly large data sets and limit > > the response to what fits in my system limitations but not json in this > > format. > > > > The two options I have come up with to move forward are.. > > > > 1. I strip out 90% of the array values in a file and explore that to > get > > to my view. then go to a larger system and see if I have enough to get > the > > job done. > > 2. Move to the larger system and explore there taking resources that > > don't need to be spent on a science project. > > > > Hoping the smart people have a different option for me, > > > > Jim >
Re: How do I make json files les painful
I got the same issue, engineering recommended me use drill-0.8.0 Sudhakar Thota Sent from my iPhone > On Mar 19, 2015, at 9:22 AM, Jim Bates wrote: > > I constantly, constantly, constantly hit this. > > I have json files that are just a huge collection of an array of json > objects > > example > "MyArrayInTheFile": > [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...] > > My issue is in exploring the data, I hit this. > > Query failed: Query stopped., Record was too large to copy into vector. [ > 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ] > > I can explore csv, tab, maprdb, hive at fairly large data sets and limit > the response to what fits in my system limitations but not json in this > format. > > The two options I have come up with to move forward are.. > > 1. I strip out 90% of the array values in a file and explore that to get > to my view. then go to a larger system and see if I have enough to get the > job done. > 2. Move to the larger system and explore there taking resources that > don't need to be spent on a science project. > > Hoping the smart people have a different option for me, > > Jim
Re: How do I make json files les painful
I've heard and experienced that the array at the root level can cause a problem. Remove square brackets at the root of the object Kristine Hahn Sr. Technical Writer 415-497-8107 @krishahn On Thu, Mar 19, 2015 at 9:49 AM, Andries Engelbrecht < aengelbre...@maprtech.com> wrote: > See if you can use a prerelease 0.8 (if possible for your environment). > > On Mar 19, 2015, at 9:39 AM, Jim Bates wrote: > > > mapr-drill-0.7.0.29774-1 > > > > On Thu, Mar 19, 2015 at 11:36 AM, Andries Engelbrecht < > > aengelbre...@maprtech.com> wrote: > > > >> Which Drill version are you using? > >> > >> —Andries > >> > >> > >> On Mar 19, 2015, at 9:22 AM, Jim Bates wrote: > >> > >>> I constantly, constantly, constantly hit this. > >>> > >>> I have json files that are just a huge collection of an array of json > >>> objects > >>> > >>> example > >>> "MyArrayInTheFile": > >>> [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...] > >>> > >>> My issue is in exploring the data, I hit this. > >>> > >>> Query failed: Query stopped., Record was too large to copy into > vector. [ > >>> 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ] > >>> > >>> I can explore csv, tab, maprdb, hive at fairly large data sets and > limit > >>> the response to what fits in my system limitations but not json in this > >>> format. > >>> > >>> The two options I have come up with to move forward are.. > >>> > >>> 1. I strip out 90% of the array values in a file and explore that to > >> get > >>> to my view. then go to a larger system and see if I have enough to get > >> the > >>> job done. > >>> 2. Move to the larger system and explore there taking resources that > >>> don't need to be spent on a science project. > >>> > >>> Hoping the smart people have a different option for me, > >>> > >>> Jim > >> > >> > >
Re: How do I make json files les painful
See if you can use a prerelease 0.8 (if possible for your environment). On Mar 19, 2015, at 9:39 AM, Jim Bates wrote: > mapr-drill-0.7.0.29774-1 > > On Thu, Mar 19, 2015 at 11:36 AM, Andries Engelbrecht < > aengelbre...@maprtech.com> wrote: > >> Which Drill version are you using? >> >> —Andries >> >> >> On Mar 19, 2015, at 9:22 AM, Jim Bates wrote: >> >>> I constantly, constantly, constantly hit this. >>> >>> I have json files that are just a huge collection of an array of json >>> objects >>> >>> example >>> "MyArrayInTheFile": >>> [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...] >>> >>> My issue is in exploring the data, I hit this. >>> >>> Query failed: Query stopped., Record was too large to copy into vector. [ >>> 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ] >>> >>> I can explore csv, tab, maprdb, hive at fairly large data sets and limit >>> the response to what fits in my system limitations but not json in this >>> format. >>> >>> The two options I have come up with to move forward are.. >>> >>> 1. I strip out 90% of the array values in a file and explore that to >> get >>> to my view. then go to a larger system and see if I have enough to get >> the >>> job done. >>> 2. Move to the larger system and explore there taking resources that >>> don't need to be spent on a science project. >>> >>> Hoping the smart people have a different option for me, >>> >>> Jim >> >>
Re: How do I make json files les painful
I'll give that a shot but it is troubling because actual data size is not the direct issue. One file that is 43M works fine but one that is 5M is to larger. I'll have to dig into some logging when time is more plentiful. On Thu, Mar 19, 2015 at 11:28 AM, Matt wrote: > Is each file a single json array object? > > If so, would converting the files to a format with one line per record a > potential solution? > > Example using jq (http://stedolan.github.io/jq/): jq -c '.[]' > > > On 19 Mar 2015, at 12:22, Jim Bates wrote: > > I constantly, constantly, constantly hit this. >> >> I have json files that are just a huge collection of an array of json >> objects >> >> example >> "MyArrayInTheFile": >> [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...] >> >> My issue is in exploring the data, I hit this. >> >> Query failed: Query stopped., Record was too large to copy into vector. [ >> 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ] >> >> I can explore csv, tab, maprdb, hive at fairly large data sets and limit >> the response to what fits in my system limitations but not json in this >> format. >> >> The two options I have come up with to move forward are.. >> >> 1. I strip out 90% of the array values in a file and explore that to get >> to my view. then go to a larger system and see if I have enough to get the >> job done. >> 2. Move to the larger system and explore there taking resources that >> don't need to be spent on a science project. >> >> Hoping the smart people have a different option for me, >> >> Jim >> >
Re: How do I make json files les painful
mapr-drill-0.7.0.29774-1 On Thu, Mar 19, 2015 at 11:36 AM, Andries Engelbrecht < aengelbre...@maprtech.com> wrote: > Which Drill version are you using? > > —Andries > > > On Mar 19, 2015, at 9:22 AM, Jim Bates wrote: > > > I constantly, constantly, constantly hit this. > > > > I have json files that are just a huge collection of an array of json > > objects > > > > example > > "MyArrayInTheFile": > > [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...] > > > > My issue is in exploring the data, I hit this. > > > > Query failed: Query stopped., Record was too large to copy into vector. [ > > 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ] > > > > I can explore csv, tab, maprdb, hive at fairly large data sets and limit > > the response to what fits in my system limitations but not json in this > > format. > > > > The two options I have come up with to move forward are.. > > > > 1. I strip out 90% of the array values in a file and explore that to > get > > to my view. then go to a larger system and see if I have enough to get > the > > job done. > > 2. Move to the larger system and explore there taking resources that > > don't need to be spent on a science project. > > > > Hoping the smart people have a different option for me, > > > > Jim > >
Re: How do I make json files les painful
Which Drill version are you using? —Andries On Mar 19, 2015, at 9:22 AM, Jim Bates wrote: > I constantly, constantly, constantly hit this. > > I have json files that are just a huge collection of an array of json > objects > > example > "MyArrayInTheFile": > [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...] > > My issue is in exploring the data, I hit this. > > Query failed: Query stopped., Record was too large to copy into vector. [ > 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ] > > I can explore csv, tab, maprdb, hive at fairly large data sets and limit > the response to what fits in my system limitations but not json in this > format. > > The two options I have come up with to move forward are.. > > 1. I strip out 90% of the array values in a file and explore that to get > to my view. then go to a larger system and see if I have enough to get the > job done. > 2. Move to the larger system and explore there taking resources that > don't need to be spent on a science project. > > Hoping the smart people have a different option for me, > > Jim
Re: How do I make json files les painful
Is each file a single json array object? If so, would converting the files to a format with one line per record a potential solution? Example using jq (http://stedolan.github.io/jq/): jq -c '.[]' On 19 Mar 2015, at 12:22, Jim Bates wrote: I constantly, constantly, constantly hit this. I have json files that are just a huge collection of an array of json objects example "MyArrayInTheFile": [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...] My issue is in exploring the data, I hit this. Query failed: Query stopped., Record was too large to copy into vector. [ 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ] I can explore csv, tab, maprdb, hive at fairly large data sets and limit the response to what fits in my system limitations but not json in this format. The two options I have come up with to move forward are.. 1. I strip out 90% of the array values in a file and explore that to get to my view. then go to a larger system and see if I have enough to get the job done. 2. Move to the larger system and explore there taking resources that don't need to be spent on a science project. Hoping the smart people have a different option for me, Jim