Parth,
You are right. If we put t.others.additional in select list, in addition to
t.others, then the output is wrong. The JSON file I used has 2 rows:
{"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
{"some":"yes","others":{"other":"true","all":"false","sometimes":"yes",
Given the sample rows that Stefan provided, the query -
select `some`, t.others, t.others.additional from `test.json` t;
does produce incorrect results -
*| *yes * | *{"additional":"last entries only"} * | *last entries only *
|*
instead of
*| *yes * |
*{"other":"true","all":"false"
On Thu, Jul 23, 2015 at 2:19 PM, Juergen Kneissl wrote:
> On 07/23/15 22:04, Jason Altekruse wrote:
> > I'm very glad to hear that it exceeded your expectations. An important
> > point I would like to add, when you unzipped the file you likely allowed
> > drill to ready not only on both nodes, bu
hi,
I can provide you with json file an statements to reproduce it if you wish.
thank you for looking into this.
regards,
-Stefan
On Jul 23, 2015 9:03 PM, "Jinfeng Ni" wrote:
> Hi Stefán,
>
> Thanks a lot for bringing up this issue, which is really helpful to improve
> Drill.
>
> I tried to
On 07/23/15 22:04, Jason Altekruse wrote:
> I'm very glad to hear that it exceeded your expectations. An important
> point I would like to add, when you unzipped the file you likely allowed
> drill to ready not only on both nodes, but also on multiple threads on each
> node. When the file was compr
Hi Stefán,
Thanks a lot for bringing up this issue, which is really helpful to improve
Drill.
I tried to re-produce the incorrect issues, and I could re-produce the
missing data issue of CTAS parquet, but I could not re-produce the missing
data issue if I query the JSON file directly.
Here is ho
Thank you.
On Thu, Jul 23, 2015 at 7:24 PM, Ted Dunning wrote:
> On Thu, Jul 23, 2015 at 3:55 AM, Stefán Baxter
> wrote:
>
> > Someone must review the underlying optimization errors to prevent this
> from
> > happening to others.
> >
>
> Jinfeng and Parth are examining this issue to try to co
I'm very glad to hear that it exceeded your expectations. An important
point I would like to add, when you unzipped the file you likely allowed
drill to ready not only on both nodes, but also on multiple threads on each
node. When the file was compressed, only a single thread was reading and
proces
On Thu, Jul 23, 2015 at 8:18 AM, Jacques Nadeau wrote:
> The good news is, Drill does provide a nice simple way to abstract these
> details away. You simply create a view on top of HBase [1]. The view can
> contain the physical conversions. Then users can interact with the view
> rather than t
On Thu, Jul 23, 2015 at 3:55 AM, Stefán Baxter
wrote:
> Someone must review the underlying optimization errors to prevent this from
> happening to others.
>
Jinfeng and Parth are examining this issue to try to come to a deeper
understanding. Not surprisingly, they are a little quiet as they do
Hi Jason,
On 07/23/15 18:53, Jason Altekruse wrote:
> I could be wrong, but I believe that gzip is not a compression that can be
> split, you must read and decompress the file from start to end. In this
> case we can not parallelize the read. This stackoverflow article mentions
> bzip2 as an alter
Drill Hangout 2015-07-21
Participants: Jacques, Parth (scribe), Sudheesh, Hakim, Khurram, Aman,
Jinfeng, Kristine, Sean
Feature list for Drill 1.2 was discussed. The following items were
considered (disussion/ comments if any are summarized with each item):
1.
Memory allocator improvemen
I could be wrong, but I believe that gzip is not a compression that can be
split, you must read and decompress the file from start to end. In this
case we can not parallelize the read. This stackoverflow article mentions
bzip2 as an alternative compression used by hadoop to solve this problem
and a
Yes of course:
I add the SQL and the output of EXPLAIN PLAN FOR:
-
jdbc:drill:schema=dfs> explain plan for SELECT columns[4] stichtag,
columns[10] geschlecht, count(columns[0]) anzahl FROM
dfs.`/mon_ew_xt_uni_bus_11.csv.gz` where 1 = 1 and columns[23] = 1
I don't think Drill is supposed to "ignore" data. My understanding is that
the reader will read the new fields which will cause a schema change, and
depending on the query (if all operators involved can handle the schema
change or not) the query should either succeed or fail.
My understanding is th
Hi,
The only right answer to this question must be to a) "adapt to additional
information" and b) "try the hardest to accommodate changes".
The current behavior must be seen as completely worthless (sorry for the
strong language).
Regards,
-Stefan
On Thu, Jul 23, 2015 at 4:16 PM, Matt wrote:
On 23 Jul 2015, at 10:53, Abdel Hakim Deneche wrote:
When you try to read schema-less data, Drill will first investigate
the
1000 rows to figure out a schema for your data, then it will use this
schema for the remaining of the query.
To clarify, if the JSON schema changes on the 1001st 1MMth
Hi Juergen,
can you share the query you tried to run ?
Thanks
On Thu, Jul 23, 2015 at 9:10 AM, Juergen Kneissl wrote:
> Hi everybody,
>
> I installed and configured a small cluster with two machines (gnu/linux)
> with the following setup:
>
> zookeeper in version 3.4.6 , drill in version 1.1.0
Hi everybody,
I installed and configured a small cluster with two machines (gnu/linux)
with the following setup:
zookeeper in version 3.4.6 , drill in version 1.1.0 and also using
hadoop (version 2.7.1) hdfs as dist. filesystem.
So, I am playing around a bit, but what I am still not understandin
Even for csv or json format, directory-based Partition pruning [1] could be
leveraged to prune data. You have to use the special dir* field in your
query to filter out un-wanted data, or define a view which uses dir* field
and then query against the view.
1. https://drill.apache.org/docs/partition
Unfortunately, HBase hasn't embraced embedded schema last I checked. There
are definitely tools on top of HBase that do provide this. For example I
believe Wibi and Cask both provide a more structured approach on top of
HBase. Someone could extend the plugin to support these systems.
The good n
Hi Abdel,
Thank you for taking the time to respond. I know my frustration is leaking
through but that does not mean I don appreciate everything you and the
Drill team is doing, I do.
I also understand the premise of the optimization but I find it to
restrictive and it certainly does not fit our d
Hi Hafiz,
I guess it depends on the query. Generally Drill will try to push any
filter you have in your query to the leaf nodes so they won't send any row
that doesn't pass the filter. Also only the columns that appear in the
query will be loaded from the file.
The file format you are querying al
Hi all!
I want to know about drill working. Suppose i query to data on S3. the
volume of data is huge in GB's. So when I query to that data what happens?
whether drill load whole data on drill nodes? or just query data without
loading whole data?
Hi Stefan,
Sorry to hear about your misadventure in Drill land. I will try to give you
some more informations, but I also have limited knowledge for this specific
case and other developers will probably jump in to correct me.
When you try to read schema-less data, Drill will first investigate the
Sounds great. The docs are written in markdown and stored in github-pages.
You can contribute to the docs using a pull request. Click the pencil icon
on the top right side of the page, and go for it. Thanks much, really
appreciate your feedback and help.
Kristine Hahn
Sr. Technical Writer
415-497-
Thank you for pointing me to this section - somehow I missed it.
How do you maintain this documentation? Maybe I have time to add more
examples, so it will be easier for other people to start to work with
HBase/Drill combo.
On Thu, Jul 23, 2015 at 3:38 PM, Kristine Hahn wrote:
> These data type
These data types are listed
http://drill.apache.org/docs/supported-data-types/#convert_to-and-convert_from,
but need to be easier to find and include useful examples as Ted pointed
out. Sorry you had a problem. We'll add links to the types from strategic
places.
On Thursday, July 23, 2015, Alex Ot
Hi,
The workaround for this was to edit the first line in the json file and
fake a value for the "additional" field.
That way the optimizer could not decide to ignore it.
Someone must review the underlying optimization errors to prevent this from
happening to others.
JSON data, which is unstruct
Thank you very much for your kind interest. Its very unfortunate that I am
currently stuck somewhere else. I will share sample data with you in a day
or so.
Usman Ali
On Thu, Jul 23, 2015 at 6:59 AM, Neeraja Rentachintala <
nrentachint...@maprtech.com> wrote:
> Hi
> Do you still see this issue.
Thank you Jacques
The INT_BE made the trick - now I'm getting status 200 instead of the
negative number. The problem is that I haven't seen any mention of this
type anywhere in the documentation - maybe the corresponding section of the
conversions should be expanded, because it refers only to sta
31 matches
Mail list logo