Re: deploy dockerized drill cluster
it's still not picking up the store.json* config changes The only way I can see to set these is with running ALTER SYSTEM query after drill api is up. Scott Kinney | DevOps stem | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you. From: John Omernik Sent: Monday, July 25, 2016 8:21 AM To: user Subject: Re: deploy dockerized drill cluster Try (for the sake of the conversation here) using host networking, and see if it changes how successful your setup is. (I know bridged is preferred, but try the host side and see what happens) John On Mon, Jul 25, 2016 at 10:06 AM, Scott Kinney wrote: > I'm running the docker in bridged network mode. > > > ________ > Scott Kinney | DevOps > stem | m 510.282.1299 > 100 Rollins Road, Millbrae, California 94030 > > This e-mail and/or any attachments contain Stem, Inc. confidential and > proprietary information and material for the sole use of the intended > recipient(s). Any review, use or distribution that has not been expressly > authorized by Stem, Inc. is strictly prohibited. If you are not the > intended recipient, please contact the sender and delete all copies. Thank > you. > > > From: John Omernik > Sent: Sunday, July 24, 2016 8:28 AM > To: user > Subject: Re: deploy dockerized drill cluster > > Are you running Drill in host networking or bridged networking? > > On Sat, Jul 23, 2016 at 1:21 PM, Scott Kinney > wrote: > > > Hm, i must have set those another way in embeded mode. I can't see where. > > Those settings persist between drill restarts. > > > > > > > > > > Scott Kinney | DevOps > > stem | m 510.282.1299 > > 100 Rollins Road, Millbrae, California 94030 > > > > This e-mail and/or any attachments contain Stem, Inc. confidential and > > proprietary information and material for the sole use of the intended > > recipient(s). Any review, use or distribution that has not been expressly > > authorized by Stem, Inc. is strictly prohibited. If you are not the > > intended recipient, please contact the sender and delete all copies. > Thank > > you. > > > > > > From: Abhishek Girish > > Sent: Friday, July 22, 2016 1:57 PM > > To: Drill User List > > Subject: Re: deploy dockerized drill cluster > > > > You can set boot level start-up options in drill-override.conf [1]. But I > > don't think we can do the same with the system options. Someone else can > > comment if there is a workaround. > > > > On why it works for you with drill-embedded, is something I'm trying to > > understand. I attempted this and couldn't manage to get those options to > > show up in embedded mode. > > > > [1] https://drill.apache.org/docs/start-up-options/ > > > > On Fri, Jul 22, 2016 at 1:20 PM, Scott Kinney > > wrote: > > > > > I have built a drill docker images very much like > > > https://github.com/bigstepinc/apache-drill/blob/master/Dockerfile > > > > > > > > > I volume mount a drill-override.conf file that looks like: > > > > > > drill.exec:{ > > > cluster-id: drill1, > > > zk.connect: 192.1.1.1:2181", > > > sys.store.provider.local.path: "/drill-storage", > > > store.json.all_text_mode: True, > > > store.json.read_numbers_as_double: True > > > } > > > > > > it seems to be connecting to the zookeeper (otherwise it would fail > > right? > > > i should actually confirm this). > > > > > > I know it is picking up my s3 plugin that i volume mount inside > > > /drill-storage but it's not setting the json all_text_mode and > > > read_numbers_as_double. > > > > > > I have a drill-embeded instance that seems to pick these json setting > > from > > > the drill-override file just fine. > > > > > > Can you see what I'm doing wrong? > > > > > > > > > > > > > > > > > > > > > Scott Kinney | DevOps > > > stem <http://www.stem.com/> | m 510.282.1299 > > > 100 Rollins Road, Millbrae, California 94030 > > > > > > This e-mail and/or any attachments contain Stem, Inc. confidential and > > > proprietary information and material for the sole use of the intended > > > recipient(s). Any review, use or distribution that has not been > expressly > > > authorized by Stem, Inc. is strictly prohibited. If you are not the > > > intended recipient, please contact the sender and delete all copies. > > Thank > > > you. > > > > > >
Re: deploy dockerized drill cluster
I'm running the docker in bridged network mode. Scott Kinney | DevOps stem | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you. From: John Omernik Sent: Sunday, July 24, 2016 8:28 AM To: user Subject: Re: deploy dockerized drill cluster Are you running Drill in host networking or bridged networking? On Sat, Jul 23, 2016 at 1:21 PM, Scott Kinney wrote: > Hm, i must have set those another way in embeded mode. I can't see where. > Those settings persist between drill restarts. > > > > ________ > Scott Kinney | DevOps > stem | m 510.282.1299 > 100 Rollins Road, Millbrae, California 94030 > > This e-mail and/or any attachments contain Stem, Inc. confidential and > proprietary information and material for the sole use of the intended > recipient(s). Any review, use or distribution that has not been expressly > authorized by Stem, Inc. is strictly prohibited. If you are not the > intended recipient, please contact the sender and delete all copies. Thank > you. > > > From: Abhishek Girish > Sent: Friday, July 22, 2016 1:57 PM > To: Drill User List > Subject: Re: deploy dockerized drill cluster > > You can set boot level start-up options in drill-override.conf [1]. But I > don't think we can do the same with the system options. Someone else can > comment if there is a workaround. > > On why it works for you with drill-embedded, is something I'm trying to > understand. I attempted this and couldn't manage to get those options to > show up in embedded mode. > > [1] https://drill.apache.org/docs/start-up-options/ > > On Fri, Jul 22, 2016 at 1:20 PM, Scott Kinney > wrote: > > > I have built a drill docker images very much like > > https://github.com/bigstepinc/apache-drill/blob/master/Dockerfile > > > > > > I volume mount a drill-override.conf file that looks like: > > > > drill.exec:{ > > cluster-id: drill1, > > zk.connect: 192.1.1.1:2181", > > sys.store.provider.local.path: "/drill-storage", > > store.json.all_text_mode: True, > > store.json.read_numbers_as_double: True > > } > > > > it seems to be connecting to the zookeeper (otherwise it would fail > right? > > i should actually confirm this). > > > > I know it is picking up my s3 plugin that i volume mount inside > > /drill-storage but it's not setting the json all_text_mode and > > read_numbers_as_double. > > > > I have a drill-embeded instance that seems to pick these json setting > from > > the drill-override file just fine. > > > > Can you see what I'm doing wrong? > > > > > > > > > > > > > > Scott Kinney | DevOps > > stem <http://www.stem.com/> | m 510.282.1299 > > 100 Rollins Road, Millbrae, California 94030 > > > > This e-mail and/or any attachments contain Stem, Inc. confidential and > > proprietary information and material for the sole use of the intended > > recipient(s). Any review, use or distribution that has not been expressly > > authorized by Stem, Inc. is strictly prohibited. If you are not the > > intended recipient, please contact the sender and delete all copies. > Thank > > you. > > >
Re: deploy dockerized drill cluster
Hm, i must have set those another way in embeded mode. I can't see where. Those settings persist between drill restarts. Scott Kinney | DevOps stem | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you. From: Abhishek Girish Sent: Friday, July 22, 2016 1:57 PM To: Drill User List Subject: Re: deploy dockerized drill cluster You can set boot level start-up options in drill-override.conf [1]. But I don't think we can do the same with the system options. Someone else can comment if there is a workaround. On why it works for you with drill-embedded, is something I'm trying to understand. I attempted this and couldn't manage to get those options to show up in embedded mode. [1] https://drill.apache.org/docs/start-up-options/ On Fri, Jul 22, 2016 at 1:20 PM, Scott Kinney wrote: > I have built a drill docker images very much like > https://github.com/bigstepinc/apache-drill/blob/master/Dockerfile > > > I volume mount a drill-override.conf file that looks like: > > drill.exec:{ > cluster-id: drill1, > zk.connect: 192.1.1.1:2181", > sys.store.provider.local.path: "/drill-storage", > store.json.all_text_mode: True, > store.json.read_numbers_as_double: True > } > > it seems to be connecting to the zookeeper (otherwise it would fail right? > i should actually confirm this). > > I know it is picking up my s3 plugin that i volume mount inside > /drill-storage but it's not setting the json all_text_mode and > read_numbers_as_double. > > I have a drill-embeded instance that seems to pick these json setting from > the drill-override file just fine. > > Can you see what I'm doing wrong? > > > > > > > Scott Kinney | DevOps > stem <http://www.stem.com/> | m 510.282.1299 > 100 Rollins Road, Millbrae, California 94030 > > This e-mail and/or any attachments contain Stem, Inc. confidential and > proprietary information and material for the sole use of the intended > recipient(s). Any review, use or distribution that has not been expressly > authorized by Stem, Inc. is strictly prohibited. If you are not the > intended recipient, please contact the sender and delete all copies. Thank > you. >
deploy dockerized drill cluster
I have built a drill docker images very much like https://github.com/bigstepinc/apache-drill/blob/master/Dockerfile I volume mount a drill-override.conf file that looks like: drill.exec:{ cluster-id: drill1, zk.connect: 192.1.1.1:2181", sys.store.provider.local.path: "/drill-storage", store.json.all_text_mode: True, store.json.read_numbers_as_double: True } it seems to be connecting to the zookeeper (otherwise it would fail right? i should actually confirm this). I know it is picking up my s3 plugin that i volume mount inside /drill-storage but it's not setting the json all_text_mode and read_numbers_as_double. I have a drill-embeded instance that seems to pick these json setting from the drill-override file just fine. Can you see what I'm doing wrong? ____ Scott Kinney | DevOps stem <http://www.stem.com/> | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you.
Re: Error creating view
Yep, I didn't set the workspace 'writable': true user error as usual ________ Scott Kinney | DevOps stem | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you. From: Abhishek Girish Sent: Wednesday, July 20, 2016 8:49 AM To: Drill User List Subject: Re: Error creating view Can you change the workspace to one where you would like the view to be created in? For example, you could say `use dfsl.tmp;` followed by your create view command. The error here says that the dfsl.default workspace is not writable. On Wed, Jul 20, 2016 at 8:27 AM, Scott Kinney wrote: > I'm trying to create a view with this command: > > > create view dfsl.smiview as select * from dfsl.`/path/to/view.json`; > > ? > Error: PARSE ERROR: Unable to create or drop tables/views. Schema [dfsl] > is immutable. > > > [Error Id: da4ae9f6-016a-487a-a85d-7297d9e22187 on ops-apachedrill:31010] > (state=,code=0) > > > dfsl is just a copy of the default dfs storage plugin. > > > > > > Scott Kinney | DevOps > stem <http://www.stem.com/> | m 510.282.1299 > 100 Rollins Road, Millbrae, California 94030 > > This e-mail and/or any attachments contain Stem, Inc. confidential and > proprietary information and material for the sole use of the intended > recipient(s). Any review, use or distribution that has not been expressly > authorized by Stem, Inc. is strictly prohibited. If you are not the > intended recipient, please contact the sender and delete all copies. Thank > you. >
Error creating view
I'm trying to create a view with this command: create view dfsl.smiview as select * from dfsl.`/path/to/view.json`; ? Error: PARSE ERROR: Unable to create or drop tables/views. Schema [dfsl] is immutable. [Error Id: da4ae9f6-016a-487a-a85d-7297d9e22187 on ops-apachedrill:31010] (state=,code=0) dfsl is just a copy of the default dfs storage plugin. Scott Kinney | DevOps stem <http://www.stem.com/> | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you.
Re: Best way to set schema to handle different json structures
No i haven't. I'll give that a try. Thanks. ________ Scott Kinney | DevOps stem | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you. From: rahul challapalli Sent: Monday, July 11, 2016 3:51 PM To: user Subject: Re: Best way to set schema to handle different json structures Did you try creating a view with the merged schema? Then you can try running all your queries on top of that view. - Rahul On Mon, Jul 11, 2016 at 3:23 PM, Scott Kinney wrote: > We have several different json structures we want to run queries across. I > can take a sample of each and merge the json together as python > dictionaries then write that out to a file and have drill read that file > first to set the schema but I dont think this will be very practical as we > will have out data in s3 in a name/year/month/day and I dont want to have > to put this schema file in every directory is s3. that seems unmanageable. > > > Is there a way to set the schema from a file before making a query via the > REST api? > > > > > Scott Kinney | DevOps > stem <http://www.stem.com/> | m 510.282.1299 > 100 Rollins Road, Millbrae, California 94030 > > This e-mail and/or any attachments contain Stem, Inc. confidential and > proprietary information and material for the sole use of the intended > recipient(s). Any review, use or distribution that has not been expressly > authorized by Stem, Inc. is strictly prohibited. If you are not the > intended recipient, please contact the sender and delete all copies. Thank > you. >
Best way to set schema to handle different json structures
We have several different json structures we want to run queries across. I can take a sample of each and merge the json together as python dictionaries then write that out to a file and have drill read that file first to set the schema but I dont think this will be very practical as we will have out data in s3 in a name/year/month/day and I dont want to have to put this schema file in every directory is s3. that seems unmanageable. Is there a way to set the schema from a file before making a query via the REST api? Scott Kinney | DevOps stem <http://www.stem.com/> | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you.
Re: missing data in json structure when using web / api
Not that I know of but I'm new to drill. I've done 'alter system' for json all_text_mode & read_numbers_as_double. Do you know of a setting that might cause something like this? ________ Scott Kinney | DevOps stem | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you. From: John Omernik Sent: Friday, July 01, 2016 4:06 PM To: user@drill.apache.org Subject: Re: missing data in json structure when using web / api Are you using options that are maintained in the cli but not the rest API due to a lack of impersonation? On Friday, July 1, 2016, Scott Kinney wrote: > When i query from sqlline i can see all the data, very complicated / > nested json structure but when i query with the api or the web ui a lot of > the data is missing. > > ? > > > > Scott Kinney | DevOps > stem <http://www.stem.com/> | m 510.282.1299 > 100 Rollins Road, Millbrae, California 94030 > > This e-mail and/or any attachments contain Stem, Inc. confidential and > proprietary information and material for the sole use of the intended > recipient(s). Any review, use or distribution that has not been expressly > authorized by Stem, Inc. is strictly prohibited. If you are not the > intended recipient, please contact the sender and delete all copies. Thank > you. > -- Sent from my iThing
missing data in json structure when using web / api
When i query from sqlline i can see all the data, very complicated / nested json structure but when i query with the api or the web ui a lot of the data is missing. ? Scott Kinney | DevOps stem <http://www.stem.com/> | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you.
Re: array in json with mixed values (int and float)
it didn't work when i did an alter session via the api but worked then i did and alter system via the repl. I'm guessing each query via the api is a session to alter sessions via the api only last for that one call? Anywho, that did the trick Parth, thank you! ________ Scott Kinney | DevOps stem | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you. ____ From: Scott Kinney Sent: Friday, July 01, 2016 10:51 AM To: user@drill.apache.org Subject: Re: array in json with mixed values (int and float) That looks promising but didn't work. ________ Scott Kinney | DevOps stem | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you. From: Parth Chandra Sent: Friday, July 01, 2016 10:43 AM To: user@drill.apache.org Subject: Re: array in json with mixed values (int and float) I haven't tried this myself, but setting store.json.read_numbers_as_double to true might help. On Fri, Jul 1, 2016 at 9:27 AM, Scott Kinney wrote: > When running a query on a json file via the api returns an error that i > dont see when running the same query in the REPL. > > "errorMessage" : "UNSUPPORTED_OPERATION ERROR: In a list of type FLOAT8, > encountered a value of type BIGINT. Drill does not support lists of > different types.\n\nFile > /PowerBladeAvahi.1.telemetry/json/telemetry_flatstore_3_2_prod-telemetry-w2-9_1.log-143771.json.gz\nRecord > 1\nLine 1\nColumn 502\nField soc\nFragment 0:0\n\n[Error Id: > be38e1c4-b1c0-4d55-9ab1-fe4ebdc44a9e on ops-apachedrill:31010]" > > > I pulled the line out of the file. There is a key 'foo': [ 99, 99.1, 99.8 > ]. > Is there a way get drill to handle this? Maybe treat all ints as floats? > > > > Scott Kinney | DevOps > stem <http://www.stem.com/> | m 510.282.1299 > 100 Rollins Road, Millbrae, California 94030 > > This e-mail and/or any attachments contain Stem, Inc. confidential and > proprietary information and material for the sole use of the intended > recipient(s). Any review, use or distribution that has not been expressly > authorized by Stem, Inc. is strictly prohibited. If you are not the > intended recipient, please contact the sender and delete all copies. Thank > you. >
Re: array in json with mixed values (int and float)
That looks promising but didn't work. Scott Kinney | DevOps stem | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you. From: Parth Chandra Sent: Friday, July 01, 2016 10:43 AM To: user@drill.apache.org Subject: Re: array in json with mixed values (int and float) I haven't tried this myself, but setting store.json.read_numbers_as_double to true might help. On Fri, Jul 1, 2016 at 9:27 AM, Scott Kinney wrote: > When running a query on a json file via the api returns an error that i > dont see when running the same query in the REPL. > > "errorMessage" : "UNSUPPORTED_OPERATION ERROR: In a list of type FLOAT8, > encountered a value of type BIGINT. Drill does not support lists of > different types.\n\nFile > /PowerBladeAvahi.1.telemetry/json/telemetry_flatstore_3_2_prod-telemetry-w2-9_1.log-143771.json.gz\nRecord > 1\nLine 1\nColumn 502\nField soc\nFragment 0:0\n\n[Error Id: > be38e1c4-b1c0-4d55-9ab1-fe4ebdc44a9e on ops-apachedrill:31010]" > > > I pulled the line out of the file. There is a key 'foo': [ 99, 99.1, 99.8 > ]. > Is there a way get drill to handle this? Maybe treat all ints as floats? > > > > Scott Kinney | DevOps > stem <http://www.stem.com/> | m 510.282.1299 > 100 Rollins Road, Millbrae, California 94030 > > This e-mail and/or any attachments contain Stem, Inc. confidential and > proprietary information and material for the sole use of the intended > recipient(s). Any review, use or distribution that has not been expressly > authorized by Stem, Inc. is strictly prohibited. If you are not the > intended recipient, please contact the sender and delete all copies. Thank > you. >
array in json with mixed values (int and float)
When running a query on a json file via the api returns an error that i dont see when running the same query in the REPL. "errorMessage" : "UNSUPPORTED_OPERATION ERROR: In a list of type FLOAT8, encountered a value of type BIGINT. Drill does not support lists of different types.\n\nFile /PowerBladeAvahi.1.telemetry/json/telemetry_flatstore_3_2_prod-telemetry-w2-9_1.log-143771.json.gz\nRecord 1\nLine 1\nColumn 502\nField soc\nFragment 0:0\n\n[Error Id: be38e1c4-b1c0-4d55-9ab1-fe4ebdc44a9e on ops-apachedrill:31010]" I pulled the line out of the file. There is a key 'foo': [ 99, 99.1, 99.8 ]. Is there a way get drill to handle this? Maybe treat all ints as floats? ________ Scott Kinney | DevOps stem <http://www.stem.com/> | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you.
Re: gzipped json files not named .json.gz
Hi Jason, Thanks for getting back to me. We were able to get the spark job to append the .json.gz so we are ok for now. I tried working with local files of json. Drill will not query it if it's not named .json. I didn't try gzipped. But since we got them renamed in s3 I'm out of the woods. thanks! ________ Scott Kinney | DevOps stem | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you. From: Jason Altekruse Sent: Tuesday, June 28, 2016 3:05 PM To: user Subject: Re: gzipped json files not named .json.gz Hi Scott, >From some quick testing, setting the defaultInputFormat to "json" appears to be working as it was designed. It is true that we have the limitation of relying entirely on extensions for detecting compression of text and json files. I am able to read all of these files in a workspace with JSON set as the default format. Were you not seeing this behavior? aa.gza.jsona.json.gz We could consider adding default compression as an option in a workspace, but are you really unable to move the files? It seems like the best option might be to just rename, as I would think other tools would have trouble reading these as well. Jason Altekruse Software Engineer at Dremio Apache Drill Committer On Tue, Jun 28, 2016 at 2:48 PM, Parth Chandra wrote: > Yes, I believe that would work if the file is not compressed. > > On Tue, Jun 28, 2016 at 12:01 PM, Scott Kinney > wrote: > > > Well that's a bummer but I believe it setting "defaultInputFormat": > "json" > > doesn't seem to have any effect. > > > > > > > > Scott Kinney | DevOps > > stem | m 510.282.1299 > > 100 Rollins Road, Millbrae, California 94030 > > > > This e-mail and/or any attachments contain Stem, Inc. confidential and > > proprietary information and material for the sole use of the intended > > recipient(s). Any review, use or distribution that has not been expressly > > authorized by Stem, Inc. is strictly prohibited. If you are not the > > intended recipient, please contact the sender and delete all copies. > Thank > > you. > > > > > > From: Parth Chandra > > Sent: Tuesday, June 28, 2016 11:36 AM > > To: user@drill.apache.org > > Subject: Re: gzipped json files not named .json.gz > > > > Hi Scott, > > > > Unlikely that this will work without the extension. Drill uses Hadoop's > > CompressionCodecFactory class [1] that infers the compression type from > the > > extension. > > > > Parth > > > > [1] > > > > > https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/io/compress/CompressionCodecFactory.html#getCodec(org.apache.hadoop.fs.Path) > > > > On Tue, Jun 28, 2016 at 8:47 AM, Scott Kinney > > wrote: > > > > > Can I have drill open gzipped json files who's names do not end in > > > .json.gz? > > > > > > We have a spark job generating these files and it just dosn't want to > > > change the name or append the .json.gz. > > > > > > ? > > > > > > > > > > > > Scott Kinney | DevOps > > > stem <http://www.stem.com/> | m 510.282.1299 > > > 100 Rollins Road, Millbrae, California 94030 > > > > > > This e-mail and/or any attachments contain Stem, Inc. confidential and > > > proprietary information and material for the sole use of the intended > > > recipient(s). Any review, use or distribution that has not been > expressly > > > authorized by Stem, Inc. is strictly prohibited. If you are not the > > > intended recipient, please contact the sender and delete all copies. > > Thank > > > you. > > > > > >
Re: Scaling Drill with s3 plugin
Ah, ok. Thats something. Scott Kinney | DevOps stem | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you. From: Jason Altekruse Sent: Tuesday, June 28, 2016 1:25 PM To: user Subject: Re: Scaling Drill with s3 plugin You can also configure storage plugins with the rest API. [1] [1] - https://drill.apache.org/docs/rest-api/ Jason Altekruse Software Engineer at Dremio Apache Drill Committer On Tue, Jun 28, 2016 at 12:04 PM, Scott Kinney wrote: > From what I can tell the s3 plugin must be configured via the web > interface. We really need drill to be able to scale and cluster. Has anyone > found a way to configure drill to use s3 with xml (or whatever) files > rather than the web interface? > > ? > > > ____ > Scott Kinney | DevOps > stem <http://www.stem.com/> | m 510.282.1299 > 100 Rollins Road, Millbrae, California 94030 > > This e-mail and/or any attachments contain Stem, Inc. confidential and > proprietary information and material for the sole use of the intended > recipient(s). Any review, use or distribution that has not been expressly > authorized by Stem, Inc. is strictly prohibited. If you are not the > intended recipient, please contact the sender and delete all copies. Thank > you. >
Scaling Drill with s3 plugin
>From what I can tell the s3 plugin must be configured via the web interface. >We really need drill to be able to scale and cluster. Has anyone found a way >to configure drill to use s3 with xml (or whatever) files rather than the web >interface? ? ____ Scott Kinney | DevOps stem <http://www.stem.com/> | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you.
Re: gzipped json files not named .json.gz
Well that's a bummer but I believe it setting "defaultInputFormat": "json" doesn't seem to have any effect. ________ Scott Kinney | DevOps stem | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you. From: Parth Chandra Sent: Tuesday, June 28, 2016 11:36 AM To: user@drill.apache.org Subject: Re: gzipped json files not named .json.gz Hi Scott, Unlikely that this will work without the extension. Drill uses Hadoop's CompressionCodecFactory class [1] that infers the compression type from the extension. Parth [1] https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/io/compress/CompressionCodecFactory.html#getCodec(org.apache.hadoop.fs.Path) On Tue, Jun 28, 2016 at 8:47 AM, Scott Kinney wrote: > Can I have drill open gzipped json files who's names do not end in > .json.gz? > > We have a spark job generating these files and it just dosn't want to > change the name or append the .json.gz. > > ? > > > > Scott Kinney | DevOps > stem <http://www.stem.com/> | m 510.282.1299 > 100 Rollins Road, Millbrae, California 94030 > > This e-mail and/or any attachments contain Stem, Inc. confidential and > proprietary information and material for the sole use of the intended > recipient(s). Any review, use or distribution that has not been expressly > authorized by Stem, Inc. is strictly prohibited. If you are not the > intended recipient, please contact the sender and delete all copies. Thank > you. >
gzipped json files not named .json.gz
Can I have drill open gzipped json files who's names do not end in .json.gz? We have a spark job generating these files and it just dosn't want to change the name or append the .json.gz. ? ________ Scott Kinney | DevOps stem <http://www.stem.com/> | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you.
Re: drill wont open parquet files in s3 with _metadata and _common_metadata files
Nevermind, this is because I changed the directory structure. Scott Kinney | DevOps stem | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you. From: Scott Kinney Sent: Tuesday, June 14, 2016 8:36 AM To: user@drill.apache.org Subject: drill wont open parquet files in s3 with _metadata and _common_metadata files We have a spark job generating parquet files and uploading them to s3. drill wont query these files unless i delete the _metadata and _common_metadata files from s3. I'd rather not modify the spark job or have to delete these files. Is there a way to get drill to ignore these files or work around them in some way? Thanks all Scott Kinney | DevOps stem <http://www.stem.com/> | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you.
drill wont open parquet files in s3 with _metadata and _common_metadata files
We have a spark job generating parquet files and uploading them to s3. drill wont query these files unless i delete the _metadata and _common_metadata files from s3. I'd rather not modify the spark job or have to delete these files. Is there a way to get drill to ignore these files or work around them in some way? Thanks all Scott Kinney | DevOps stem <http://www.stem.com/> | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you.
best approach for complex, several levels of json nesting
We have lots a different json structures gzipped in s3 that we want to query (currently looking at Drill and Druid). What is the best approach for getting this into a queryable format for drill? I tried FLATTEN(KVGEN(data)) but since out structures are often nested multiple levels this doesn't work. ?We have also converted to parquet but when i run drill on a parquet file the structure isn't getting flattened either. What is the best approach for this situation? Thanks all, ________ Scott Kinney | DevOps stem <http://www.stem.com/> | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you.
Re: queries take over 2 min
Yep, I disabled all other storage plugins (default dfs, cp and custom s3 plugin) and the alter session took .5 sec as opposed to 169. Thank you! Scott Kinney | DevOps stem | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you. From: Jinfeng Ni Sent: Wednesday, June 01, 2016 9:18 PM To: user Subject: Re: queries take over 2 min Drill will register schema tree for each enabled storage plugin, even though the query does not refer to one particular SP. If your s3 is very slow, it's possible that such slow SP will impact all the queries. On Wed, Jun 1, 2016 at 9:02 PM, Scott Kinney wrote: > I do have an s3 plugin enabled. I'll try disabling it. I hope that's not it > because ultimatly s3 is where we want it pointed to. > I'll also try restarting the instance. I've never run across a bad instance > but i'm sure it happens. also the instance feels normal with everything else. > > > > > Scott Kinney | DevOps > stem | m 510.282.1299 > 100 Rollins Road, Millbrae, California 94030 > > This e-mail and/or any attachments contain Stem, Inc. confidential and > proprietary information and material for the sole use of the intended > recipient(s). Any review, use or distribution that has not been expressly > authorized by Stem, Inc. is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. Thank you. > > > From: John Omernik > Sent: Wednesday, June 01, 2016 6:59 PM > To: user > Subject: Re: queries take over 2 min > > You probably already tried this, but I have gotten "bad" instances that for > whatever reason just performed horribly. Once I terminated and restarted > all was better. May be worth a try as well. > On Jun 1, 2016 8:42 PM, "Abdel Hakim Deneche" wrote: > >> sometimes, if you have an issue in one of your storage plugin it affects >> all queries even those not querying that specific plugin. Do you have any >> enable storage plugin that's causing issues ? >> >> On Wed, Jun 1, 2016 at 2:21 PM, Scott Kinney >> wrote: >> >> > i'm running queries on local json files and queries take over 2 min. I'm >> > running simple drill-embeded install on e2 t2.large. cpu and memory >> > utilization is very low while the query is running. even 'alter session >> > set' command takes minutes. >> > >> > >> > >> > >> > Scott Kinney | DevOps >> > stem <http://www.stem.com/> | m 510.282.1299 >> > 100 Rollins Road, Millbrae, California 94030 >> > >> > This e-mail and/or any attachments contain Stem, Inc. confidential and >> > proprietary information and material for the sole use of the intended >> > recipient(s). Any review, use or distribution that has not been expressly >> > authorized by Stem, Inc. is strictly prohibited. If you are not the >> > intended recipient, please contact the sender and delete all copies. >> Thank >> > you. >> > >> >> >> >> -- >> >> Abdelhakim Deneche >> >> Software Engineer >> >> <http://www.mapr.com/> >> >> >> Now Available - Free Hadoop On-Demand Training >> < >> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available >> > >>
Re: queries take over 2 min
I do have an s3 plugin enabled. I'll try disabling it. I hope that's not it because ultimatly s3 is where we want it pointed to. I'll also try restarting the instance. I've never run across a bad instance but i'm sure it happens. also the instance feels normal with everything else. ____ Scott Kinney | DevOps stem | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you. From: John Omernik Sent: Wednesday, June 01, 2016 6:59 PM To: user Subject: Re: queries take over 2 min You probably already tried this, but I have gotten "bad" instances that for whatever reason just performed horribly. Once I terminated and restarted all was better. May be worth a try as well. On Jun 1, 2016 8:42 PM, "Abdel Hakim Deneche" wrote: > sometimes, if you have an issue in one of your storage plugin it affects > all queries even those not querying that specific plugin. Do you have any > enable storage plugin that's causing issues ? > > On Wed, Jun 1, 2016 at 2:21 PM, Scott Kinney > wrote: > > > i'm running queries on local json files and queries take over 2 min. I'm > > running simple drill-embeded install on e2 t2.large. cpu and memory > > utilization is very low while the query is running. even 'alter session > > set' command takes minutes. > > > > > > > > > > Scott Kinney | DevOps > > stem <http://www.stem.com/> | m 510.282.1299 > > 100 Rollins Road, Millbrae, California 94030 > > > > This e-mail and/or any attachments contain Stem, Inc. confidential and > > proprietary information and material for the sole use of the intended > > recipient(s). Any review, use or distribution that has not been expressly > > authorized by Stem, Inc. is strictly prohibited. If you are not the > > intended recipient, please contact the sender and delete all copies. > Thank > > you. > > > > > > -- > > Abdelhakim Deneche > > Software Engineer > > <http://www.mapr.com/> > > > Now Available - Free Hadoop On-Demand Training > < > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > >
queries take over 2 min
i'm running queries on local json files and queries take over 2 min. I'm running simple drill-embeded install on e2 t2.large. cpu and memory utilization is very low while the query is running. even 'alter session set' command takes minutes. ________ Scott Kinney | DevOps stem <http://www.stem.com/> | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you.
Re: Do i need hadoop installed to use dfs storage?
There is something wrong with my storage plugin config because it works when i use the default dfs storage plugin and query like. SELECT * FROM dfs.`/path/to/jsonfile.json`; probably ... "connection": "file:///tmp/data/", ... i left the three / ________ Scott Kinney | DevOps stem | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you. From: Nathan Griffith Sent: Tuesday, May 31, 2016 5:23 PM To: user@drill.apache.org Subject: Re: Do i need hadoop installed to use dfs storage? Hi Scott, You definitely don't need to have Hadoop to query your local file system. Could you list the exact command to Drill that gave you this error? Best, Nathan Griffith Technical Writer Dremio On Tue, May 31, 2016 at 1:26 PM, Scott Kinney wrote: > > I'm trying to test running drill on gz json files in s3 and i keep getting. > > > VALIDATION ERROR: From line 1, column 15 to line 1, column 17: Table > 's3_file.gz' not found > > > I downloaded the file and unzipped it, setup a new storage plugin to point > to the local file > > > { > "type": "file", > "enabled": true, > "connection": "file:///tmp/data/", > "config": null, > "workspaces": { > "root": { > "location": "/", > "writable": false, > "defaultInputFormat": "json" > }, > ... > > And i get the same thing. > I do not have hadoop installed. do i need it? > > Thanks, > > > > > Scott Kinney | DevOps > stem <http://www.stem.com/> | m 510.282.1299 > 100 Rollins Road, Millbrae, California 94030 > > This e-mail and/or any attachments contain Stem, Inc. confidential and > proprietary information and material for the sole use of the intended > recipient(s). Any review, use or distribution that has not been expressly > authorized by Stem, Inc. is strictly prohibited. If you are not the > intended recipient, please contact the sender and delete all copies. Thank > you. >
Do i need hadoop installed to use dfs storage?
I'm trying to test running drill on gz json files in s3 and i keep getting. VALIDATION ERROR: From line 1, column 15 to line 1, column 17: Table 's3_file.gz' not found I downloaded the file and unzipped it, setup a new storage plugin to point to the local file { "type": "file", "enabled": true, "connection": "file:///tmp/data/", "config": null, "workspaces": { "root": { "location": "/", "writable": false, "defaultInputFormat": "json" }, ... And i get the same thing. I do not have hadoop installed. do i need it? Thanks, Scott Kinney | DevOps stem <http://www.stem.com/> | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you.
Re: connecting to s3
Perfect! Thank you so much. Scott Kinney | DevOps stem | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you. From: Nathan Griffith Sent: Thursday, May 26, 2016 2:44 PM To: user@drill.apache.org Subject: Re: connecting to s3 Hi, Scott. That article is a bit dated. Try setting up your core-site.xml file like in this post: http://www.dremio.com/blog/how-to-query-s3-data-using-amazons-s3a-library/ Best, Nathan Griffith Technical Writer Dremio On Thu, May 26, 2016 at 2:33 PM, Scott Kinney wrote: > I followed the steps here... > > https://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/ > > I have confirmed the aws key work. > > Drill version 1.6.0 > > jets3t version 0.9.2 > > > created the storage plugin in the web ui calling it s3-foo-bar then run... > > use `s3-foo-bar`.`root`; > > > Error: > > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > ServiceException: Service Error Message. > > Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: > Failed to create schema tree: org.jets3t.service.ServiceException: Service > Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error > Message: ="UTF-8"?>AccessDeniedAccess > Denied95BB171A886F3C27NLoKTo5jzzBS3j0mIRUeVwF8tGvTFUh/4jJrkPr+0U6kjo1e3t7uX3wv2Thkb1G/+/ZDODJXbVE= > > What am I doing wrong? > > Thanks all, > > > > Scott Kinney | DevOps > stem <http://www.stem.com/> | m 510.282.1299 > 100 Rollins Road, Millbrae, California 94030 > > This e-mail and/or any attachments contain Stem, Inc. confidential and > proprietary information and material for the sole use of the intended > recipient(s). Any review, use or distribution that has not been expressly > authorized by Stem, Inc. is strictly prohibited. If you are not the > intended recipient, please contact the sender and delete all copies. Thank > you. >
connecting to s3
I followed the steps here... https://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/ I have confirmed the aws key work. Drill version 1.6.0 jets3t version 0.9.2 created the storage plugin in the web ui calling it s3-foo-bar then run... use `s3-foo-bar`.`root`; Error: org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: ServiceException: Service Error Message. Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Failed to create schema tree: org.jets3t.service.ServiceException: Service Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: AccessDeniedAccess Denied95BB171A886F3C27NLoKTo5jzzBS3j0mIRUeVwF8tGvTFUh/4jJrkPr+0U6kjo1e3t7uX3wv2Thkb1G/+/ZDODJXbVE= What am I doing wrong? Thanks all, Scott Kinney | DevOps stem <http://www.stem.com/> | m 510.282.1299 100 Rollins Road, Millbrae, California 94030 This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you.