Re: Avro - Let's talk Avro again

2017-08-19 Thread Stefán Baxter
may not sound nice to the ears but is exactly the kind of feedback > that will make this project truly successful. > > Best, > Saurabh > > > > I > > On Fri, Aug 18, 2017 at 1:42 PM, Stefán Baxter > wrote: > > > Hi John, > > > > Love Drill but

Re: Drill schema handling [Was: Avro - Let's talk Avro again]

2017-08-19 Thread Stefán Baxter
trying to prioritize > > given what we have. But we do not have to feel constrained. We can get > more > > developers to participate in this and help out. And I am very positive > > about that approach-I know that I helped a user here to get help on using > > Apache Drill insid

Re: Avro - Let's talk Avro again

2017-08-18 Thread Stefán Baxter
; I was guessing you would chime in with a response ;) > > > > Are you still using Drill w/ Avro how has things been lately? > > > > On Thu, Aug 17, 2017 at 8:00 AM, Stefán Baxter < > ste...@activitystream.com> > > wrote: > > > >&g

Re: Avro - Let's talk Avro again

2017-08-17 Thread Stefán Baxter
woha!!! (sorry, I just had to) Best of luck with that! Regards, -Stefán On Thu, Aug 17, 2017 at 12:37 PM, John Omernik wrote: > I know Avro is the unwanted child of the Drill world. (I know others have > tried to mature the Avro support and that has been something that still is > in a "exp

Re: Parquet filter pushdown and string fields that use dictionary encoding

2017-05-31 Thread Stefán Baxter
is present in the subsequent data pages. This would (most likely) be done > > during execution time, and I don't believe Drill does that as yet. > > > > > > > > <http://www.mapr.com/> > > > > > > From: Stef

Re: Parquet filter pushdown and string fields that use dictionary encoding

2017-05-31 Thread Stefán Baxter
for comparison is what makes the dependency on > min-max statistics by the Parquet library be unreliable. > > > ________ > From: Stefán Baxter > Sent: Monday, May 29, 2017 1:41:30 PM > To: user > Subject: Parquet filter pushdown and string fields t

Parquet filter pushdown and string fields that use dictionary encoding

2017-05-29 Thread Stefán Baxter
those unique values to facilitate "segment pruning" when looking for data belonging to individual sessions/customers. Best regards, -Stefán Baxter

Re: Batch load of unstructured data in Drill

2016-12-08 Thread Stefán Baxter
Hi, Have you considered batching them up into a nicely defined directory structure and use directory pruning as part of your queries? I ask because our batch processes does that. Data is arranged into Hour, Day, Month, Quarter, Years structures (which we then roll-up in different ways, based on v

Re: Batch load of unstructured data in Drill

2016-12-07 Thread Stefán Baxter
Hi Alexander, Drill allows you to both a) query the data directly in json format and b) convert it to Parqet (have a look at the CTAS function) Hope that helps, -Stefán On Wed, Dec 7, 2016 at 1:08 PM, Alexander Reshetov < alexander.v.reshe...@gmail.com> wrote: > Hello, > > I want to load batch

Re: Reading Avro Arrays

2016-04-12 Thread Stefán Baxter
me" : "field1", > >> > > "type" : "int" > >> > > } ] > >> > > }, > >> > > "java-class" : "java.util.List" > >> > > } > >> > &

Re: Continued Avro Frustration

2016-04-01 Thread Stefán Baxter
s today, and could really help > new users adopt Drill if they are using other data formats. > > Jason Altekruse > Software Engineer at Dremio > Apache Drill Committer > > On Fri, Apr 1, 2016 at 1:42 PM, Stefán Baxter > wrote: > > > Yes Parth, you are 100% right and

Re: Continued Avro Frustration

2016-04-01 Thread Stefán Baxter
ed to solve. From there, you can continue to expect > that > > people will help you--as they can. There are no guarantees in open > source. > > Everything comes through the kindness and shared goals of those in the > > community. > > > > thanks, > > Jacques &

Re: Continued Avro Frustration

2016-04-01 Thread Stefán Baxter
e in the > community. > > thanks, > Jacques > > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Fri, Apr 1, 2016 at 5:43 AM, Stefán Baxter > wrote: > > > Hi, > > > > Is it at all possible that we are the only company trying to use Avr

Re: Continued Avro Frustration

2016-04-01 Thread Stefán Baxter
an continue to expect that > people will help you--as they can. There are no guarantees in open source. > Everything comes through the kindness and shared goals of those in the > community. > > thanks, > Jacques > > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > &g

Continued Avro Frustration

2016-04-01 Thread Stefán Baxter
Hi, Is it at all possible that we are the only company trying to use Avro with Drill to some serious extent? We continue to coma across all sorts of embarrassing shortcomings like the one we are dealing with now where a schema change exception is thrown even when working with a single Avro file (

Re: Drills' support for Avro Union types and

2016-03-28 Thread Stefán Baxter
It's quite plausible that this has nothing to do with union types I just assumed that simple Avro schema must fully supported but that should not be taken for granted. On Mon, Mar 28, 2016 at 3:40 PM, Stefán Baxter wrote: > Hi, > > I have reworked/refactored our Avro based logging

Drills' support for Avro Union types and

2016-03-28 Thread Stefán Baxter
Hi, I have reworked/refactored our Avro based logging system trying to make the whole Drill + Avro->Parquet experience a bit more agreeable. Long story short I'm getting this error when selecting form multiple Avro files even though these files share the EXCACT same schema: Error: UNSUPPORTED_OP

Re: Reading Avro Arrays

2016-03-25 Thread Stefán Baxter
bjects with drill; I somehow got that wrong from > the > > documentation... > > > > Cheers, > > Johannes > > > > On Thu, Mar 24, 2016 at 2:14 PM, Stefán Baxter < > ste...@activitystream.com> > > wrote: > > > > > FYI: flattening of embedded

Re: Reading Avro Arrays

2016-03-24 Thread Stefán Baxter
> reference. I tried drill 1.6, the data is an array of complex objects > though. I will try to setup a drill dev environment and see if i can modify > the tests to fail. > > Johannes > > On Wed, Mar 23, 2016 at 8:13 PM, Stefán Baxter > wrote: > > > FYI. this seems to

Professional services for Drill

2016-03-23 Thread Stefán Baxter
Hi, I there anyone here that provides professional service for Drill? We are trying to optimize our system in order to speed up smaller queries and aiming for sub second response times when dealing with a < 100 million records from Parquet. We are, for example, looking at profiles where EXTERNAL

Re: Reading Avro Arrays

2016-03-23 Thread Stefán Baxter
FYI. this seems to be working in 1.6, at least on the Avro data that we have. On Wed, Mar 23, 2016 at 6:59 PM, Stefán Baxter wrote: > Hi again, > > What version of Drill are you using? > > Regards, > - Stefán > > On Wed, Mar 23, 2016 at 4:49 PM, Stefán Baxter &

Re: Reading Avro Arrays

2016-03-23 Thread Stefán Baxter
Hi again, What version of Drill are you using? Regards, - Stefán On Wed, Mar 23, 2016 at 4:49 PM, Stefán Baxter wrote: > Hi Johannes, > > As great as Drill is the Avro plugin has been a source of frustration for > us @activitystream. > > We have a small UDF library [1] (apac

Re: Reading Avro Arrays

2016-03-23 Thread Stefán Baxter
Hi Johannes, As great as Drill is the Avro plugin has been a source of frustration for us @activitystream. We have a small UDF library [1] (apache licensed) which contains a function can return an array (List) from Avro as a CSV list. You could use that to roll your own or provide me with a smal

Re: Avro storage strategy?

2016-03-08 Thread Stefán Baxter
Hi, We use Avro to store/accumulate/badge streaming data and then we migrate it to Parquet. We then use union queries to merge fresh and historical data (Avro + Parquet) Things to keep in mind (AFAIK): - Avro is a lot slower and more inefficient, storage space and performance wise, than P

Avro no longer selects data correctly from a sub-structure :: 1.6-SNAPSHOT

2016-03-05 Thread Stefán Baxter
Hi, This used to work in 1.5 and I think it must be a regression. Parquet: 0: jdbc:drill:zk=local> select s.client_ip.ip from dfs.asa.`/processed/<>/transactions` as s limit 1; ++ | EXPR$0 | ++ | 87.55.171.210 | ++ 1 row selected (1.184 sec

Re: Is using an aggregate value in a where clause not supported?

2016-03-04 Thread Stefán Baxter
low.com/questions/2068682/why-cant-i-use-alias-in-a-count-column-and-reference-it-in-a-having-clause > > On Fri, Mar 4, 2016 at 12:40 PM, Stefán Baxter > wrote: > > Having fails as well > > > > On Fri, Mar 4, 2016 at 8:00 PM, Bob Rumsby wrote: > > > >> Witho

Re: Is using an aggregate value in a where clause not supported?

2016-03-04 Thread Stefán Baxter
; On Fri, Mar 4, 2016 at 11:53 AM, Stefán Baxter > wrote: > > > Hi, > > > > Having adds to the trouble and claims that the field needs to be grouped > > and then fails the same way if it's added to group by. > > > > I ended up wrapping this in a &q

Re: Is using an aggregate value in a where clause not supported?

2016-03-04 Thread Stefán Baxter
ote: > Try using the HAVING clause. The WHERE clause cannot constrain the results > of aggregate functions. > http://drill.apache.org/docs/having-clause/ > > On Fri, Mar 4, 2016 at 11:34 AM, Stefán Baxter > wrote: > > > Hi, > > > > I'm using parquet+

Is using an aggregate value in a where clause not supported?

2016-03-04 Thread Stefán Baxter
Hi, I'm using parquet+drill and the following statement works just fine: select sold_to, count(*) as trans_count from dfs.asa.`/processed/venuepoint/transactions` where group by sold_to; When addin this where clause nothing is returned: select sold_to, count(*) as trans_count from dfs.asa.`/tra

Avro support in Drill - Missing support for the IN operator and other frustrating things

2016-02-25 Thread Stefán Baxter
o warn people on the Drill website that the Avro support is experimental, at best - Stefán Baxter

Issue with compression :: Using Drill 1.5 and parquet-mr/parquet-avro 1.8.1

2016-02-08 Thread Stefán Baxter
Hi, I'm using following Avro Parquet writer to convert our Avro to Parquet: writer = AvroParquetWriter .builder( new Path( parquetFile.getPath() )) .withSchema(schema) .enableDictionaryEncoding() .withDataModel(ReflectData.get()) .withWriterVersion(Parquet

Re: Parquet drill date fields

2016-02-04 Thread Stefán Baxter
our complex parquet > reader. We are currently depending on 1.8.1 in Drill, so it should be > compatible. > > I think it would be safest to run with `store.parquet.use_new_reader` set > to true if you were going to working with parquet 2.0 files right now. > > - Jason > >

Re: Parquet drill date fields

2016-02-04 Thread Stefán Baxter
tchas Regards, -Stefán On Thu, Feb 4, 2016 at 4:51 PM, Stefán Baxter wrote: > Hi again, > > I did a little test and ~5 million fairly wide records take 791 MB in > parquet without dictionary encoding and 550MB with dictionary encoding > enabled (The non-dictionary encoded file is a

Writing Drill compatible Parquet in Java using parquet-mr

2016-02-04 Thread Stefán Baxter
Hi, What things do I need to know if I want to write Drill compatible Parquet in Java using Parquet-MR? - Latest stable version of Parquet-MR is 1.8.1 is that too new? - Will the standard Parquet work? - Any specific footer information required - Are there any does and don'ts? I wan

Re: Parquet drill date fields

2016-02-04 Thread Stefán Baxter
in ~20% less time than the one that uses dictionary encoding. Regards, -Stefán On Thu, Feb 4, 2016 at 3:48 PM, Stefán Baxter wrote: > Hi Jason, > > Thank you for the explanation. > > I have several *low* cardinality fields that contain semi-long values and > they are,

Re: Parquet drill date fields

2016-02-04 Thread Stefán Baxter
ce, and so we actually take a > performance hit re-materializing the data out of the dictionary upon read. > > If you would be interested in trying to contribute such an enhancement I > would be willing to help you get started with it. > > - Jason > > On Wed, Feb 3, 2016 a

Re: Convert ISO 8601 string to timestamp

2016-02-03 Thread Stefán Baxter
Hi, This small UDF project contains a asTimestamp function that you may find useful: https://github.com/activitystream/asdrill It accepts multiple value types and returns them as timestamp. Feel free to do with it what you will :) Regards, -Stefán On Wed, Feb 3, 2016 at 6:22 PM, Jason Altekru

Parquet drill date fields

2016-02-03 Thread Stefán Baxter
Hi, I'm converting Avro to parquest and I'm getting this log entry back for a timestamp field: Written 1,008,842B for [occurred_at] INT64: 591,435 values, 2,169,557B raw, 1,008,606B comp, 5 pages, encodings: [BIT_PACKED, PLAIN, PLAIN_DICTIONARY, RLE], dic { 123,832 entries, 990,656B raw, 123,832B

Re: Avro reader - Possible regression in 1.5-SNAPSHOT

2016-02-02 Thread Stefán Baxter
eb 2, 2016 at 9:07 AM, Abdel Hakim Deneche > wrote: > > Thanks > > > > On Tue, Feb 2, 2016 at 9:03 AM, Stefán Baxter > > > wrote: > > > >> https://issues.apache.org/jira/browse/DRILL-4339 > >> > >> On Tue, Feb 2, 2016 at 4:46 PM, Abdel Hakim

Re: Avro reader - Possible regression in 1.5-SNAPSHOT

2016-02-02 Thread Stefán Baxter
https://issues.apache.org/jira/browse/DRILL-4339 On Tue, Feb 2, 2016 at 4:46 PM, Abdel Hakim Deneche wrote: > Hi Stefán, > > Can you open a JIRA for this, please ? > > Thanks > > On Tue, Feb 2, 2016 at 6:21 AM, Stefán Baxter > wrote: > > > Hi, > > >

Re: Avro reader - Possible regression in 1.5-SNAPSHOT

2016-02-02 Thread Stefán Baxter
Hi, I can confirm that this same query+avro-files work in 1.4 so this is probably a regression Regards, -Stefan On Tue, Feb 2, 2016 at 1:59 PM, Stefán Baxter wrote: > Hi, > > I'm getting this error on master/head using the Avro Reader: > > "what ever the mind of man

Avro reader - Possible regression in 1.5-SNAPSHOT

2016-02-02 Thread Stefán Baxter
Hi, I'm getting this error on master/head using the Avro Reader: "what ever the mind of man can conceive and believe, drill can query" 0: jdbc:drill:zk=local> select * from dfs.asa.`/`; Exception in thread "drill-executor-2" java.lang.NoSuchMethodError: org.apache.drill.exec.store.avro.AvroRecord

Re: Hangout Starting

2016-01-29 Thread Stefán Baxter
Hi Scot, Could you please tell me a bit more about Drill+Ignite? Regards, -Stefan On Fri, Jan 29, 2016 at 2:57 PM, scott cote wrote: > Anyone working on a DrillBit that can poke into an ignite grid? > > SCott > > > On Jan 26, 2016, at 11:58 AM, Jacques Nadeau wrote: > > > > https://plus.goog

Re: Re: How to keep s3 data in memory with apache drill?

2016-01-26 Thread Stefán Baxter
Hi, I got an email from the Tachyon team a while back were they informed my of this change. I think you should visit their google group and check the status of this change. Regards, -Stefán On Tue, Jan 26, 2016 at 9:28 PM, Stephan Kölle wrote: > I'm working with tachyon 0.8.2 (November 11,

Re: How to keep s3 data in memory with apache drill?

2016-01-26 Thread Stefán Baxter
Hi, I think the latest version of Tachyon uses a transparent storage structure. Regards, -Stefán On Tue, Jan 26, 2016 at 10:05 AM, Stephan Kölle wrote: > Querying JSON data stored on aws s3 with apache drill works awesome, but > drill fetches the data fresh from s3 for every query. > > How t

Re: JDBC Driver - Possible regression

2016-01-20 Thread Stefán Baxter
https://issues.apache.org/jira/browse/DRILL-4291 -Stefan On Wed, Jan 20, 2016 at 3:08 PM, Abdel Hakim Deneche wrote: > Stefán, > > Please reopen the JIRA and add a comment describing what you are seeing. > > Thanks > > On Wed, Jan 20, 2016 at 4:34 AM, Stefán Baxter &g

Re: JDBC Driver - Possible regression

2016-01-20 Thread Stefán Baxter
Hi again, We have verified that the error exists on master:head (1.5-SNAPSHOT). Regards, -Stefan On Wed, Jan 20, 2016 at 10:39 AM, Stefán Baxter wrote: > Hi, > > We are using the 1.5-SNAPSHOT version of the JDBC drilver (all) and we > seem to be getting this old thin

JDBC Driver - Possible regression

2016-01-20 Thread Stefán Baxter
Hi, We are using the 1.5-SNAPSHOT version of the JDBC drilver (all) and we seem to be getting this old thing: https://issues.apache.org/jira/browse/DRILL-2482 We are either doing something wrong or this or this is a regression. Has anyone else experienced not being able to get nested structures

Re: New Slack setup for Devs and Users

2016-01-19 Thread Stefán Baxter
if the invitation stands then I would like to get access to Slack. Regards, -ste...@activitystream.com On Thu, Oct 1, 2015 at 10:08 AM, Jacques Nadeau wrote: > Hey Guys, > > We've been using Slack a lot internally and have found it very useful. I > setup a new slack for Drill developers and u

Re: Efficient joins in Drill - avoiding the massive overhead of scan based joins

2016-01-18 Thread Stefán Baxter
continue with our Lucene based plan? We are more than happy to sponsor this work and/or pay for professional service to anyone that has the knowledge and the time to assist us. Regards, -Stefán On Sun, Jan 17, 2016 at 7:50 PM, Stefán Baxter wrote: > Hi Rahul, > > I'm aware

Re: Efficient joins in Drill - avoiding the massive overhead of scan based joins

2016-01-17 Thread Stefán Baxter
n applying the join condition, only 10k rows are needed from the right > > side. > > > > How long does it take to read a few million records from Lucene? > (Recently > > with Elastic we've been seeing ~50-100k/second per thread when only > > retrieving a single

Re: Efficient joins in Drill - avoiding the massive overhead of scan based joins

2016-01-17 Thread Stefán Baxter
ing ~50-100k/second per thread when only > retrieving a single stored field.) > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Sat, Jan 16, 2016 at 12:11 PM, Stefán Baxter > > wrote: > > > Hi Jacques, > > > > Thank you for taking the ti

Re: Efficient joins in Drill - avoiding the massive overhead of scan based joins

2016-01-16 Thread Stefán Baxter
ns all the information we need for the join (stored fields). I guess this would make more sense to people if we said we were using Solr or Elastic Search but this use-case is not as complex as the one detailed in Drill-3929. Regards, -Stefan On Sat, Jan 16, 2016 at 8:11 PM, Stefán Baxter wr

Re: Efficient joins in Drill - avoiding the massive overhead of scan based joins

2016-01-16 Thread Stefán Baxter
that would give us a better idea of how to point you in > the right direction. > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Sat, Jan 16, 2016 at 5:18 AM, Stefán Baxter > wrote: > > > Hi, > > > > Can anyone point me to an implementation

Efficient joins in Drill - avoiding the massive overhead of scan based joins

2016-01-16 Thread Stefán Baxter
Hi, Can anyone point me to an implementation where joins are implemented with full support for filters and efficient handling of joins based on indexes. The only code I have come across all seems to rely on complete scan of the related table and that is not acceptable for the use case we are work

Re: Lucene Plugin :: Join Filter and pushdown

2016-01-15 Thread Stefán Baxter
ct col1 from tbl1)) > > Any other suggestions or pointers are appreciated > > - Rahul > > > On Thu, Jan 14, 2016 at 2:52 PM, Stefán Baxter > wrote: > > > Hi, > > > > I'm working on the Lucene plugin (see previous email) and the focus now > is &

Lucene Plugin :: Join Filter and pushdown

2016-01-14 Thread Stefán Baxter
Hi, I'm working on the Lucene plugin (see previous email) and the focus now is support for joins with filter push-down to avoid the default table scan that is provided by default. I'm fairly new to Drill and in over my head, to be honest, but this is fun and with this addition the Lucene plugin c

Re: 1.5-SNAPSHOT and UDFs

2016-01-14 Thread Stefán Baxter
Hi again, I modified this slightly and now it works (it's accepting ValueHolder and resolving that function correctly) regards, -Stefan On Thu, Jan 14, 2016 at 8:25 AM, Stefán Baxter wrote: > Hi Jacques, > > I'm still struggling with this. The function you seem to be

Re: 1.5-SNAPSHOT and UDFs

2016-01-14 Thread Stefán Baxter
: f544f3fd-c1fa-4a45-84e7-ae31ae095427 on Lightning:31010] Please note that the construct parameters and the expected classes match 100% Regards, -Stefan On Tue, Jan 12, 2016 at 3:35 PM, Stefán Baxter wrote: > > OK, I came across this in some UDF sample code and welcome the "best &

Re: Community Drill UDFs

2016-01-13 Thread Stefán Baxter
Hi, This is my playgrund https://github.com/activitystream/asdrill Please feel free to do with it what you will. I'd be more than happy to participate in a community driven UDF project. Regards, Stefán On Wed, Jan 13, 2016 at 10:28 PM, Ted Dunning wrote: > There is https://github.com/mapr-dem

Re: 1.5-SNAPSHOT and UDFs

2016-01-12 Thread Stefán Baxter
g = ByteBufUtil.decodeString(myBuf.nioBuffer(), > Charsets.UTF_8); > > > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Fri, Jan 8, 2016 at 12:04 PM, Stefán Baxter > wrote: > > > So, > > > > This is happening due to changes made to DrillBuffer as a par

Minor assistance with a Lucene plugin (reader for now)

2016-01-11 Thread Stefán Baxter
Hi, Rahul Challapalli started to implement a Lucene reader a while back and I'm trying to pitch in. (#1/#2) I have made some progress but I could benefit from talking to some one that know his way around the implementation of a reader/writer before I continue. Discussion points: - Pest pract

Re: 1.5-SNAPSHOT and UDFs

2016-01-08 Thread Stefán Baxter
just mine. Its also interesting that this signature is formatted/created every time a value is fetched. Regards, -Stefán On Fri, Jan 8, 2016 at 7:48 PM, Stefán Baxter wrote: > Hi again, > > This code can be used to reproduce this behavior: > > @Func

Re: 1.5-SNAPSHOT and UDFs

2016-01-08 Thread Stefán Baxter
ocIfNeeded(someValue.length()); for (Byte aByte : someValue.toString().getBytes()) output.buffer.setByte(output.end ++, aByte); } } On Fri, Jan 8, 2016 at 7:43 PM, Stefán Baxter wrote: > Hi, > > This seems to have something to do with reading string values from a > VarC

Re: 1.5-SNAPSHOT and UDFs

2016-01-08 Thread Stefán Baxter
value but now it returns: {DrillBuf[77], udle identityHashCode == 1660956802, identityHashCode == 343154168} PT1H The value is there in the second line (Seems to include a newline character) Any ideas? Regards, -Stefan On Fri, Jan 8, 2016 at 7:24 PM, Stefán Baxter wrote: > Hi, > >

1.5-SNAPSHOT and UDFs

2016-01-08 Thread Stefán Baxter
Hi, My UDFs have stopped working with the latest version of 1.5-SNAPSHOT (pulled just now). The error is: SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: "{DrillBuf[74], udle identityHash..." Fragme

Re: Does drill recognize new line correctly?

2016-01-06 Thread Stefán Baxter
test \n value"} > > => No result found. > # This is valid JSON, I believe isn't it? > > Target JSON: > Same file with copied row and edited. > > {"a": "test value"} > {"a": "test \n value"} > >

Re: Does drill recognize new line correctly?

2016-01-05 Thread Stefán Baxter
Hi, I'm not the right person to give you an answer from the Drill perspective but is it possible that your JSON serializer is not escaping characters that should be escaped? please see: - http://stackoverflow.com/questions/4253367/how-to-escape-a-json-string-containing-newline-characters-

A single users view/opinion of Drill

2015-12-27 Thread Stefán Baxter
Hi Drillers, I have been meaning to share some thoughts on Drill for a long time and what I, or we at Activity Stream, believe would make Drill better (for us). Please keep in mind that this is a single sided view from a simple, non-contributing, user and please excuse my English. We love using D

Re: Avro - Schema is good - Schema validation is bad

2015-12-25 Thread Stefán Baxter
ps://cwiki.apache.org/confluence/display/PIG/AvroStorage>. >- If the schema validation flag is set, then we can consider the union >schema of all the files in a directory recursively. > > > On Fri, Dec 18, 2015 at 9:17 AM, Stefán Baxter > wrote: > > > Hi Kamesh, >

Re: Join with empty table

2015-12-18 Thread Stefán Baxter
Hi Andries, Where does it say that the query for an non existing file is unions/joins should fail? I ask because I'm interested in the "basic rules of Drill". Rgards, -Stefán On Fri, Dec 18, 2015 at 4:37 PM, Andries Engelbrecht < aengelbre...@maprtech.com> wrote: > 1) When the table/file does

Re: Avro - Schema is good - Schema validation is bad

2015-12-17 Thread Stefán Baxter
lready JIRA for this. > > https://issues.apache.org/jira/browse/DRILL-4120?focusedCommentId=15048070&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15048070 > > > On Thu, Dec 17, 2015 at 1:28 AM, Stefán Baxter > wrote: > > > Hi, > > &

Re: Avro - Schema is good - Schema validation is bad

2015-12-16 Thread Stefán Baxter
ither have to consider the complete union > Schema before pruning files or consider all fields as either known or > possible after pruning files. > > Stefan, if you haven't already, please open a bug that known fields are > failing to validate in Avro and we will fix shortly. Sorry

Re: Avro - Schema is good - Schema validation is bad

2015-12-14 Thread Stefán Baxter
On Tue, Dec 15, 2015 at 1:10 AM, Ted Dunning wrote: > Sigh of relief is premature. Nobody has committed to carrying this > interpretation forward. > > > > On Mon, Dec 14, 2015 at 11:44 AM, Stefán Baxter > > wrote: > > > /me sighs of relief > > >

Re: Avro - Schema is good - Schema validation is bad

2015-12-14 Thread Stefán Baxter
he files that I see could > include or exclude more recent files that have added a new field. > > That means that a query would succeed or fail according to which date range > I use for the query. > > That seems pretty radically bad. > > > > > On Mon, Dec 14, 2

Re: Avro - Schema is good - Schema validation is bad

2015-12-14 Thread Stefán Baxter
ructs the schema for them and hence nulls > for invalid fields. > > > On Mon, Dec 14, 2015 at 2:36 PM, Stefán Baxter > wrote: > > > Hi, > > > > I'm getting the following error when querying Avro files: > > > > Error: VALIDATION ERROR: From line

Avro - Schema is good - Schema validation is bad

2015-12-14 Thread Stefán Baxter
Hi, I'm getting the following error when querying Avro files: Error: VALIDATION ERROR: From line 1, column 48 to line 1, column 57: Column 'some_col' not found in any table It's true that the field is in none of the tables I'm targeting, in that particular query, but that does not mean that it i

apache drill master buildin 1.3.0-SNAPSHOT

2015-11-25 Thread Stefán Baxter
Hi, Why is the master branch not building 1.4.0-SNAPSHOT? Just wondering, -Stefan

Re: apache drill master buildin 1.3.0-SNAPSHOT

2015-11-25 Thread Stefán Baxter
sorry... my bad :) On Wed, Nov 25, 2015 at 11:28 PM, Stefán Baxter wrote: > Hi, > > Why is the master branch not building 1.4.0-SNAPSHOT? > > Just wondering, > -Stefan >

Re: 'dir0' not found in any table

2015-11-21 Thread Stefán Baxter
thnx :) On Sat, Nov 21, 2015 at 11:00 PM, Jacques Nadeau wrote: > It looks DRILL-3810 caused the regression (and it is limited to Avro > files). Hopefully Kamesh can take a quick look. > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Sat, Nov 21, 2015 a

CTAS - Converting Avro files to parquet - Missing timestamp datatype

2015-11-21 Thread Stefán Baxter
Hi, We are using Avro files for all our logging and they contain long timestamp_mills values. When they are converted to Parquet using CTAS we wither need a hint (or something) to ensure that these columns become Timestamp values in parquet - or - we need to create a complex select with casting.

Re: 'dir0' not found in any table

2015-11-21 Thread Stefán Baxter
Hi, I just created this: https://issues.apache.org/jira/browse/DRILL-4120 Regards, -Stefan On Sat, Nov 21, 2015 at 9:28 PM, Stefán Baxter wrote: > > After some digging around there is an explanation. > > This all works fine when the directory structure contains Parquet files >

Re: 'dir0' not found in any table

2015-11-21 Thread Stefán Baxter
/src/data/stuff` where dir0 = 's1' limit > 1; > ++ > | l_partkey | > ++ > | 1552 | > +----+ > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Sat, Nov 21, 2015 at 3:47 AM, Stefán Baxter > wrote: > > > Hi

'dir0' not found in any table

2015-11-21 Thread Stefán Baxter
Hi, I'm running the latest 1.3 build and I can no longer use dirN in my queries. The query: select * from dfs.asa.`/some-root-dir/` as s where dir0 = '2015-11-19'; The error I get is: Error: "VALIDATION ERROR: From line 1, column 72 to line 1, column 75: Column 'dir0' not found in any table" A

Searching in Arrays - Parquet - REPEATED_CONTAINS

2015-11-20 Thread Stefán Baxter
Hi, I'm trying to use a an array in Parquet to store list of IDs (1:* scenario) as opposed to put each ID in a separate field. (array contains 1-10 values) This requires me to use REPEATED_CONTAINS to search for these values. I was expecting a performance penalty but it turns out that searching

Re: Avro deserialization bug - 1.3-SNAPSHOT

2015-11-13 Thread Stefán Baxter
ck onto the 1.3 > branch for inclusion in the release. > > Please try this out to see if there are remaining issues reading your data. > > https://github.com/jaltekruse/incubator-drill/tree/4056-avro-corruption-bug > > Thanks, > Jason > > > > On Fri, Nov 13, 2015

Re: Avro deserialization bug - 1.3-SNAPSHOT

2015-11-13 Thread Stefán Baxter
interpreted like double and failing when a string eventually comes along) . - Stefan On Wed, Nov 11, 2015 at 10:14 PM, Stefán Baxter wrote: > Hi, > > Can someone please verify that this is in fact a bug so I can rule out our > own mistakes? > > We have recently moved all our

Re: Avro deserialization bug - 1.3-SNAPSHOT

2015-11-11 Thread Stefán Baxter
please point me in the right direction if I was to try to fix this myself. Regards, -Stefán On Tue, Nov 10, 2015 at 2:41 PM, Stefán Baxter wrote: > Thank you Kamesh. > > I have created https://issues.apache.org/jira/browse/DRILL-4056 with the > description. > I will send you a co

Parquet and dictionary based encoding in Drill 1.3

2015-11-10 Thread Stefán Baxter
Hi, Is it safe switch on store.parquet.enable_dictionary_encoding and is the scanning of dictionary based columns optimized? Regards, -Stefán

Re: UDFs, RepeatedVarCharHolder and null values

2015-11-10 Thread Stefán Baxter
ues and it just passes the null back. Can't say I like the approach but it works. Regards, -Stefan On Tue, Nov 10, 2015 at 1:24 PM, Stefán Baxter wrote: > Hi, > > I have a UDF that deals with arrays and concatenates their value. It's > working fine with JSON but when

Re: Avro deserialization bug - 1.3-SNAPSHOT

2015-11-10 Thread Stefán Baxter
sample schema and sample input to > reproduce it. I will look into this. > > On Tue, Nov 10, 2015 at 7:55 PM, Stefán Baxter > wrote: > > > Hi, > > > > I have an Avro file that support the following data/schema: > > > > {"field":"some",

Avro deserialization bug - 1.3-SNAPSHOT

2015-11-10 Thread Stefán Baxter
Hi, I have an Avro file that support the following data/schema: {"field":"some", "classification":{"variant":"Gæst"}} When I select 10 rows from this file I get: +-+ | EXPR$0| +-+ | Gæst| | Voksen | | Voksen

UDFs, RepeatedVarCharHolder and null values

2015-11-10 Thread Stefán Baxter
Hi, I have a UDF that deals with arrays and concatenates their value. It's working fine with JSON but when working with Avro it returns an error. The error seems a bit misleading as it claims to be bot a schema change exception and a missing function exception. *The error is:* Error: SYSTEM ERR

Re: UDFs and 1.3

2015-11-10 Thread Stefán Baxter
>> drill-module.conf points at the right package names. >> >> Note here, specifically the addition of package names: >> >> >> https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/resources/drill-module.conf >> >> >>

Re: UDFs and 1.3

2015-11-09 Thread Stefán Baxter
resources/drill-module.conf > > > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Mon, Nov 9, 2015 at 11:30 AM, Stefán Baxter > wrote: > > > Hi, > > > > Now they are but the outcome remains the same. > > > > Any additional point

Re: UDFs and 1.3

2015-11-09 Thread Stefán Baxter
Hi, Now they are but the outcome remains the same. Any additional pointers? Regards, -Stefan On Mon, Nov 9, 2015 at 7:21 PM, Stefán Baxter wrote: > Hi Nathan, > > thank you for a prompt reply. > > I thought the were but they are in fact compiled with the 1.2 dependency >

Re: UDFs and 1.3

2015-11-09 Thread Stefán Baxter
's always > nice to start with some sanity checks :) > > Best, > Nathan > > On Mon, Nov 9, 2015 at 10:56 AM, Stefán Baxter > wrote: > > Hi, > > > > I have a small set of UDFs that I have been running with Drill 1.1/1.2 > > which I'm trying to

UDFs and 1.3

2015-11-09 Thread Stefán Baxter
Hi, I have a small set of UDFs that I have been running with Drill 1.1/1.2 which I'm trying to get working with 1.3 to no avail. It's as if the library is no picked up correctly even though the error I get indicates a missing function signature (variant): Error: VALIDATION ERROR: From line 1, co

Re: Drill and Parquet - Best practices - part 1

2015-11-05 Thread Stefán Baxter
st against dictionary encoded files, > because we just go ahead and materialize all of the dictionary values into > the full dataset right away at the reader, so we don't currently do any > dictionary based filtering right now. > > Looking back in this thread seems like there are a lot of

Re: Drill and Parquet - Best practices - part 1

2015-11-03 Thread Stefán Baxter
Hi again, Are incrimental timestamp values (long) being encoded in Parquet as incremental values? (This option in parquet to refrain from storing complete numbers and store only the delta between numbers to save space) Regards, -Stefan On Mon, Nov 2, 2015 at 5:54 PM, Stefán Baxter wrote

  1   2   3   >