Hi Jason, Is it possible that the Avro plugin does not use any parallelism and that all the target files are scanned sequentially by the same process? (1.5)
- Stefán On Fri, Feb 26, 2016 at 8:04 PM, Stefán Baxter <ste...@activitystream.com> wrote: > Thank you Jason. > > I do realize that this is an OS project and that everyone is doing their > best. > > There are just a few things I wish I had realized before switching over > from JSON to Avro that have caused us a lot of problems and taken a long > time. > > Your work is appreciated and I apologize for letting my frustration get > the better of me. > > - Stefán > > On Fri, Feb 26, 2016 at 8:00 PM, Jason Altekruse <altekruseja...@gmail.com > > wrote: > >> Stefan, >> >> I'm sorry that we have not been better about getting back to the issues >> you >> have filed against the Avro reader. We do appreciate all of the effort you >> have put into filing thorough bugs and being active in the discussions on >> the list. I have responded on the bug you filed on this issue [1] with a >> workaround and will be posting a patch shortly with a fix. >> >> - Jason <https://issues.apache.org/jira/browse/DRILL-4120> >> >> [1] - https://issues.apache.org/jira/browse/DRILL-4441 >> <https://issues.apache.org/jira/browse/DRILL-4120> >> >> On Thu, Feb 25, 2016 at 12:29 PM, Stefán Baxter < >> ste...@activitystream.com> >> wrote: >> >> > Hi, >> > >> > This query targets Avro files in the latest 1.5 release: >> > >> > 0: jdbc:drill:zk=local> select count(*) from >> > dfs.asa.`/streaming/venuepoint/transactions/` as s where s.sold_to = >> > 'Customer/4-2492847'; >> > +---------+ >> > | EXPR$0 | >> > +---------+ >> > | 5788 | >> > +---------+ >> > >> > 0: jdbc:drill:zk=local> select count(*) from >> > dfs.asa.`/streaming/venuepoint/transactions/` as s where s.sold_to IN >> > ('Customer/4-2492847'); >> > +---------+ >> > | EXPR$0 | >> > +---------+ >> > | 0 | >> > +---------+ >> > >> > It shows that the IN operator does not work with Avro (works with >> Parquet). >> > >> > This finally tips us over. We have invested hundreds of hours moving all >> > streaming/fresh data from JSON to Avro but the Avro part of Drill is >> broken >> > in too many ways to recommend its use to anyone. >> > >> > Attempts to report Avro errors and shortcomings, like the missing >> support >> > for dirX, has had no results. >> > >> > I think it would be prudent to warn people on the Drill website that the >> > Avro support is experimental, at best >> > >> > - Stefán Baxter >> > >> > >