from:"Hanifi Gunes"

Re: Integration with Spark

2017-01-27 Thread Hanifi GUNES

I authored the initial Spark(DoS) plugin but it was never released due to
priorities. The initial implementation allows full duplex data exchange
between Drill and Spark. That is, we can use Drill to query your data lake
and do further iterative ML on it via Spark or vice versa.

I will need to rework in order to bring it to a publishable quality. Let me
know if you are interested in contributing.


-Hanifi

2016-05-27 11:23 GMT-07:00 Zhenrui(Jerry) Zhang <
zhenrui.zh...@salesforce.com>:

> Hi,
>
> Does anyone has any updates on the integration with Spark. The feature
> mentioned in https://drill.apache.org/blog/2014/12/16/whats-coming-in-
> 2015/
> and
> http://www.slideshare.net/SparkSummit/adding-complex-
> data-to-spark-stackneeraja-rentachintala
> ? Also there is an issue opened in JIRA(
> https://issues.apache.org/jira/browse/DRILL-3184) for this.
>
> Thanks,
> Jerry
>

Re: CORS for Apache Drill

2016-05-23 Thread Hanifi GUNES

Great. CrossOriginFilter ships with jetty-servlets package so you will need
to import it in exec pom file at [1].

Also you may enjoy additional community support if you target dev@ list for
your code/implementation related questions.


Let me know.
-Hanifi

1: https://github.com/apache/drill/blob/master/exec/java-exec/pom.xml

2016-05-21 9:06 GMT-07:00 Wojciech Nowak :

> Hello!
>
> I have started working on issue related to enabling CORS:
> Created issue in Jira [1],
> Created branch with initial commit [2] but i have trouble importing Jetty
> class CrossOriginFilter,
> Can you guide me how should I import that dependency which normally should
> be located in org/eclipse/jetty/servlets/CrossOriginFilter.java?
> Source code of that class [3]
>
> [1] - https://issues.apache.org/jira/browse/DRILL-4690
> [2] -
> https://github.com/PythonicNinja/drill/commit/73d659ae23e455464a3530fd18b2eff3ba192a30
> [3] -
> https://github.com/eclipse/jetty.project/blob/master/jetty-servlets/src/main/java/org/eclipse/jetty/servlets/CrossOriginFilter.java
>
>
> —
> kind regards,
> Wojciech Nowak
>
>

Re: Connection and Memory management with Apache Drill

2016-05-20 Thread Hanifi GUNES

There is no magic formula for (1). The answer entirely depends on what sort
of data you are working with, what sort of queries you are running and what
sort of SLAs you are willing to accept. Regardless, you can configure
memory requirements of Drill as to your wish modifying bootstrap
configuration. However, it is important to understand that Drill treats all
compute resources the same - at least for the time being, which means a
homogenous cluster setup is preferable.

(2) is relatively easy. A user connection is kept alive until query
completes. There are few exceptions to this but generally speaking a
connection is closed upon query reaches to a terminal state.

(3) this sounds more of a 3rd party question than Drill one. Drill offers a
JDBC interface that you might be interested in exploring. Perhaps you can
let community know whether we are spring-jdbc compatible or not.


-Hanifi

2016-05-20 4:20 GMT-07:00 Khurram Faraaz :

> Hello Jasbir,
>
> Can you please elaborate on (1)
> *>> 1.   Memory and CPU requirements of Apache Drill.*
>
> I assume you already have Drill installed, since you mention that you are
> using Drill.
> Did you want to know the different memory options that Drill provides ?
> Here is a list from sys.options table.
>
> 0: jdbc:drill:schema=dfs.tmp> select * from sys.options where name like
> '%mem%';
>
> +---+--+-+--+-+-+---++
> | name  |   kind   |  type   |
>  status  |   num_val   | string_val  | bool_val  | float_val  |
>
> +---+--+-+--+-+-+---++
> | planner.memory.average_field_width| LONG | SYSTEM  |
> DEFAULT  | 8   | null| null  | null   |
> | planner.memory.enable_memory_estimation   | BOOLEAN  | SYSTEM  |
> DEFAULT  | null| null| false | null   |
> | planner.memory.hash_agg_table_factor  | DOUBLE   | SYSTEM  |
> DEFAULT  | null| null| null  | 1.1|
> | planner.memory.hash_join_table_factor | DOUBLE   | SYSTEM  |
> DEFAULT  | null| null| null  | 1.1|
> | planner.memory.max_query_memory_per_node  | LONG | SYSTEM  |
> DEFAULT  | 2147483648  | null| null  | null   |
> | planner.memory.non_blocking_operators_memory  | LONG | SYSTEM  |
> DEFAULT  | 64  | null| null  | null   |
> | planner.memory_limit  | LONG | SYSTEM  |
> DEFAULT  | 268435456   | null| null  | null   |
>
> +---+--+-+--+-+-+---++
> 7 rows selected (0.367 seconds)
>
> You can set/unset change these config parameters using ALTER SYSTEM SET or
> the ALTER SESSION SET commands if you want to make those changes at the
> SYSTEM/SESSION levels appropriately.
>
> Thanks,
> Khurram
>
> On Fri, May 20, 2016 at 2:58 PM,  wrote:
>
> > Can you please let me know the details. It's impacting our task.
> >
> > Regards,
> > Jasbir Singh
> >
> > From: Sing, Jasbir
> > Sent: Monday, May 16, 2016 6:19 PM
> > To: dev@drill.apache.org
> > Cc: Sareen, Nitin A. ; Kothari, Maneesh <
> > maneesh.koth...@accenture.com>; Kumar, H. P. 
> > Subject: Connection and Memory management with Apache Drill
> >
> > Hi,
> >
> > We are using Apache Drill to read parquet files. We are facing few issues
> > and need your guidance -
> >
> > Problems -
> >
> > 1.   Memory and CPU requirements of Apache Drill.
> >
> > 2.   How Drill manages connections with queries fired and when it
> gets
> > closed?
> >
> > 3.   Does Apache Drill provides compatibility with spring-jdbc ? If
> > Yes, how to use it and if Not, how to make connection pool of 50
> > connections.
> >
> > Help in this regards with be appreciated.
> >
> > Regards,
> > Jasbir Singh
> >
> >
> >
> > 
> >
> > This message is for the designated recipient only and may contain
> > privileged, proprietary, or otherwise confidential information. If you
> have
> > received it in error, please notify the sender immediately and delete the
> > original. Any other use of the e-mail by you is prohibited. Where allowed
> > by local law, electronic communications with Accenture and its
> affiliates,
> > including e-mail and instant messaging (including content), may be
> scanned
> > by our systems for the purposes of information security and assessment of
> > internal compliance with Accenture policy.
> >
> >
> __
> >
> > www.accenture.com
> >
>

Re: How are Vectorization and Columnar execution implemented in Drill

2016-05-20 Thread Hanifi GUNES

Not sure if you have a more specific question but Drill basically relies on
an abstraction that we call Value Vectors. Take a look at [1]. The whole
execution relies on these columnar structs so no row-wise materialization
is done.


1: https://drill.apache.org/docs/value-vectors/

-Hanifi

2016-05-18 20:34 GMT-07:00 Yijie Shen :

> Hi folks,
>
> I'm new to Drill and find
> "Drill also provides an execution layer that performs SQL processing
> directly on columnar data without row materialization" and "vectorization
> in Drill allows the CPU to operate on vectors" on
> https://drill.apache.org/architecture/ page.
>
> I'm really curious about how these are implemented in Drill, after some
> search in the current code base, I could only find parquet files could be
> read in as vectors, how are these vectors handled then in `vectorized`
> manner? Could someone nicely point out where I can find more
> information/code on this?
>
> Thanks in advance.
>
> Yijie
>

Re: [DISCUSS] Nonblocking RPC

2016-04-06 Thread Hanifi GUNES

I would like to see Drill making less use of blocking calls as well.

Inlined:


*- I was thinking of changing acceptExternalEvents to a boolean flag, and
if not ready, resubmit cancellation and early termination requests as is to
the executor, and not a two phase event. Wouldn't this be sufficient?*

This would eliminate blocking calls but then you will spin CPU for no real
progress. The abstraction we are looking for here is an event driven
serialized executor akin to Guava's ListenableFuture with additional
guarantees on callback execution order(not sure ListenableFuture guarantees
that).


-Hanifi


2016-04-04 21:57 GMT-07:00 Sudheesh Katkam :

> Yes, resubmit to the serialized executor (RpcEventHandler). I wasn’t
> thinking of changing the channel pipeline, if this is what you mean by
> breaking the encapsulation model.
>
> Maybe handlers can get to this executor through a context object. Right
> now, this will not work with the SameExecutor (stack overflow).
>
> I was thinking of changing acceptExternalEvents to a boolean flag, and if
> not ready, resubmit cancellation and early termination requests as is to
> the executor, and not a two phase event. Wouldn't this be sufficient?
>
> Thank you,
> Sudheesh
>
> P.S. I hope to cleanup and document RPC related code on the way.
>
> > On Apr 1, 2016, at 3:21 PM, Jacques Nadeau  wrote:
> >
> > I think you're going to really have to break the encapsulation model to
> > accomplish this in the RPC layer.  What about updating the serialized
> > executor for those situations to do a resubmission rather than blocking
> > operation? Basically, it seems like we want a two phase termination:
> > request termination and then confirm termination. It seems like both
> should
> > be non-blocking
> >
> > The other option is to rethink the model around termination. It might be
> > worth a hangout brainstorm to see if we can come up with ideas that are
> > more outside of the box.
> >
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Fri, Apr 1, 2016 at 2:28 PM, Sudheesh Katkam 
> wrote:
> >
> >> Hey y’all,
> >>
> >> There are some blocking requests that could make an event loop *await
> >> uninterruptibly*. At this point, the Drillbit might seem unresponsive.
> This
> >> is worsened if the the event loop is not unblocked (due to a bug), which
> >> requires a Drillbit restart. Although Drill supports *offloading from
> the
> >> event loop* (experimental), this is not sufficient as the thread
> handling
> >> the queue of requests would still block.
> >>
> >> AFAIK there are two such requests:
> >> + when the user cancels
> >> <
> >>
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java#L1184
> >>>
> >> the query during planning
> >> + a fragment is canceled
> >> <
> >>
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java#L150
> >>>
> >> or terminated
> >> <
> >>
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java#L501
> >>>
> >> early during setup
> >>
> >> I think a simple solution would be to *re-queue *such requests
> (possible in
> >> above cases). That way other requests get their chance, and all requests
> >> would be eventually handled. Thoughts?
> >>
> >> Thank you,
> >> Sudheesh
> >>
>
>

Re: Including Drill and Hadoop jars in the same program

2016-04-06 Thread Hanifi Gunes

Shading might be an option here.

Thanks.
-Hanifi

1: https://maven.apache.org/plugins/maven-shade-plugin/


On Thu, Mar 31, 2016 at 6:04 PM, Paul Rogers  wrote:

> Hi All,
>
> Here’s an obscure question for the expert developers…
>
> We’re developing the YARN Application Master (AM) for Drill. We’d like to
> monitor Drill’s ZooKeeper (ZK) Drill-bit registrations. Since the ZK
> entries are in Protobuf format, and Drill already has classes do to the
> monitoring, the logical solution is just to use the Drill code in the AM.
>
> The problem is, the dependencies (such as Guava version) for Drill differ
> from those of Hadoop. Simply including Drill and YARN libraries in the same
> build trigger runtime failures due to Guava version incompatibilities.
>
> One solution is to create a custom class loader for the Drill classes, but
> that introduces a different set of complexities.
>
> Suggestions?
>
> Thanks,
>
> - Paul
>
>

Re: [DISCUSS] Remove required type

2016-03-22 Thread Hanifi GUNES

My major concern here too would be possible performance implications. That
being said, I can see ways to speed up execution relying on vector
density(for instance count) not sure how batch density would work. Perhaps
an example would throw some more light.

Why don't we think about some "good bad cases" to evaluate performance
impact? I wonder to which degree performance would degrade(if any) from
required to optional.

Also big chunk of code to handle required is already in. Any particular
reason to remove them?


-Hanifi


2016-03-22 13:36 GMT-07:00 Jacques Nadeau :

> My suggestion is we use explicit observation at the batch level. If there
> are no nulls we can optimize this batch. This would ultimately improve over
> our current situation where most parquet and all json data is nullable so
> we don't optimize. I'd estimate that the vast majority of Drills workloads
> are marked nullable whether they are or not. So what we're really
> suggesting is deleting a bunch of code which is rarely in the execution
> path.
> On Mar 22, 2016 1:22 PM, "Aman Sinha"  wrote:
>
> > I was thinking about it more after sending the previous concerns.  Agree,
> > this is an execution side change...but some details need to be worked
> out.
> > If the planner indicates to the executor that a column is non-nullable
> (e.g
> > a primary key),  the run-time generated code is more efficient since it
> > does not have to check the null bit.  Are you thinking we would use the
> > existing nullable vector and add some additional metadata (at a record
> > batch level rather than record level) to indicate non-nullability ?
> >
> >
> > On Tue, Mar 22, 2016 at 12:27 PM, Jacques Nadeau 
> > wrote:
> >
> > > Hey Aman, I believe both Steven and I were only suggesting removal only
> > > from execution, not planning. It seems like your concerns are all
> related
> > > to planning. Iit seems like the real tradeoffs in execution are
> nominal.
> > > On Mar 22, 2016 9:03 AM, "Aman Sinha"  wrote:
> > >
> > > > While it is true that there is code complexity due to the required
> > type,
> > > > what would we be trading off ?  some important considerations:
> > > >   - We don't currently have null count statistics which would need to
> > be
> > > > implemented for various data sources
> > > >   - Primary keys in the RDBMS sources (or rowkeys in hbase) are
> always
> > > > non-null, and although today we may not be doing optimizations to
> > > leverage
> > > > that,  one could easily add a rule that converts  WHERE primary_key
> IS
> > > NULL
> > > > to a FALSE filter.
> > > >
> > > >
> > > > On Tue, Mar 22, 2016 at 7:31 AM, Dave Oshinsky <
> > doshin...@commvault.com>
> > > > wrote:
> > > >
> > > > > Hi Jacques,
> > > > > Marginally related to this, I made a small change in PR-372
> > > (DRILL-4184)
> > > > > to support variable widths for decimal quantities in Parquet.  I
> > found
> > > > the
> > > > > (decimal) vectoring code to be very difficult to understand
> (probably
> > > > > because it's overly complex, but also because I'm new to Drill code
> > in
> > > > > general), so I made a small, surgical change in my pull request to
> > > > support
> > > > > keeping track of variable widths (lengths) and null booleans within
> > the
> > > > > existing fixed width decimal vectoring scheme.  Can my changes be
> > > > > reviewed/accepted, and then we discuss how to fix properly
> long-term?
> > > > >
> > > > > Thanks,
> > > > > Dave Oshinsky
> > > > >
> > > > > -Original Message-
> > > > > From: Jacques Nadeau [mailto:jacq...@dremio.com]
> > > > > Sent: Monday, March 21, 2016 11:43 PM
> > > > > To: dev
> > > > > Subject: Re: [DISCUSS] Remove required type
> > > > >
> > > > > Definitely in support of this. The required type is a huge
> > maintenance
> > > > and
> > > > > code complexity nightmare that provides little to no benefit. As
> you
> > > > point
> > > > > out, we can do better performance optimizations though null count
> > > > > observation since most sources are nullable anyway.
> > > > > On Mar 21, 2016 7:41 PM, "Steven Phillips" 
> > wrote:
> > > > >
> > > > > > I have been thinking about this for a while now, and I feel it
> > would
> > > > > > be a good idea to remove the Required vector types from Drill,
> and
> > > > > > only use the Nullable version of vectors. I think this will
> greatly
> > > > > simplify the code.
> > > > > > It will also simplify the creation of UDFs. As is, if a function
> > has
> > > > > > custom null handling (i.e. INTERNAL), the function has to be
> > > > > > separately implemented for each permutation of nullability of the
> > > > > > inputs. But if drill data types are always nullable, this
> wouldn't
> > > be a
> > > > > problem.
> > > > > >
> > > > > > I don't think there would be much impact on performance. In
> > practice,
> > > > > > I think the required type is used very rarely. And there

Re: [ANNOUNCE] Apache Drill 1.6.0 released

2016-03-19 Thread Hanifi Gunes

I am unable to find 1.6 at maven [1]. Did we publish maven artifacts?

Thanks.
-Hanifi

1: http://mvnrepository.com/artifact/org.apache.drill.exec/drill-java-exec

On Wed, Mar 16, 2016 at 10:09 AM, Parth Chandra  wrote:

> On behalf of *Apache* *Drill* community, I am happy to *announce* the
> release of *Apache* *Drill* 1.6.0.
>
> The source and binary artifacts are available at [1]
> Review a complete list of fixes and enhancements at [2]
>
> This release of *Drill* fixes many issues and introduces a number of
> enhancements, including inbound impersonation, support for JDK 1.8, and
> additional custom window frames.
>
> Thanks to everyone in the community who contributed in this release.
>
> [1] http://drill.apache.org/download/
> [2] http://drill.apache.org/docs/apache-drill-1-6-0-release-notes/
>

Re: [ANNOUNCE] Apache Drill 1.6.0 released

2016-03-18 Thread Hanifi Gunes

False alarm. Looks like this particular mirror was not up to date. [1] has
it all.


Thanks.
-Hanifi

1: http://repo1.maven.org/maven2/org/apache/drill/exec/drill-java-exec/

On Thu, Mar 17, 2016 at 4:48 PM, Hanifi Gunes <hgu...@maprtech.com> wrote:

> I am unable to find 1.6 at maven [1]. Did we publish maven artifacts?
>
> Thanks.
> -Hanifi
>
> 1: http://mvnrepository.com/artifact/org.apache.drill.exec/drill-java-exec
>
> On Wed, Mar 16, 2016 at 10:09 AM, Parth Chandra <par...@apache.org> wrote:
>
>> On behalf of *Apache* *Drill* community, I am happy to *announce* the
>> release of *Apache* *Drill* 1.6.0.
>>
>> The source and binary artifacts are available at [1]
>> Review a complete list of fixes and enhancements at [2]
>>
>> This release of *Drill* fixes many issues and introduces a number of
>> enhancements, including inbound impersonation, support for JDK 1.8, and
>> additional custom window frames.
>>
>> Thanks to everyone in the community who contributed in this release.
>>
>> [1] http://drill.apache.org/download/
>> [2] http://drill.apache.org/docs/apache-drill-1-6-0-release-notes/
>>
>
>

Re: Time for the 1.6 Release

2016-03-03 Thread Hanifi Gunes

DRILL-4416 is going to make it to 1.7. The patch causes a leak and I will
look into it once I make some time but 1.6 is too early.

On Thu, Mar 3, 2016 at 6:30 PM, Parth Chandra  wrote:

> Here's an updated list with names of reviewers added. If anyone else is
> reviewing the open PRs please let me know. Some PRs have owners names that
> I will follow up with.
> Jason, I've included your JIRA in the list.
>
>
> Committed for 1.6 -
>
> DRILL-4384 - Query profile is missing important information on WebUi -
> Merged
> DRILL-3488/pr 388 (Java 1.8 support) - Merged.
> DRILL-4410/pr 380 (listvector should initiatlize bits...) - Merged
> DRILL-4383/pr 375 (Allow custom configs for S3, Kerberos, etc) - Merged
> DRILL-4465/pr 401 (Simplify Calcite parsing & planning integration) -
> Waiting to be merged
>
> DRILL-4281/pr 400 (Drill should support inbound impersonation) (Sudheesh to
> review)
> DRILL-4372/pr 377(?) (Drill Operators and Functions should correctly expose
> their types within Calcite.) - Waiting for Aman to review. (Owners: Hsuan,
> Jinfeng, Aman, Sudheesh)
> DRILL-4313/pr 396  (Improved client randomization. Update JIRA with
> warnings about using the feature ) (Sudheesh to review.)
> DRILL-4437 (and others)/pr 394 (Operator unit test framework). (Parth to
> review)
> DRILL-4449/pr 389 (Wrong results when metadata cache is used..) (Aman to
> review)
> DRILL-4416/pr 385 (quote path separator) (Owner: Hanifi)
> DRILL-4069/pr 352 Enable RPC thread offload by default (Owner: Sudheesh)
>
> Need review -
> DRILL-4375/pr 402 (Fix the maven release profile)
> DRILL-4452/pr 395 (Update Avatica Driver to latest Calcite)
> DRILL-4332/pr 389 (Make vector comparison order stable in test framework)
> DRILL-4411/pr 381 (hash join over-memory condition)
> DRILL-4387/pr 379 (GroupScan should not use star column)
> DRILL-4184/pr 372 (support variable length decimal fields in parquet)
> DRILL-4120 - dir0 does not work when the directory structure contains Avro
> files - Partial patch available.
> DRILL-4203/pr 341 (fix dates written into parquet files to conform to
> parquet format spec)
>
> Not included (yet) -
> DRILL-3149 - No patch available
> DRILL-4441 - IN operator does not work with Avro reader - No patch
> available
> DRILL-3745/pr 399 - Hive char support - New feature - Needs QA - Not
> included in 1.6
> DRILL-3623 - Limit 0 should avoid execution when querying a known schema.
> (Need to add limitations of current impl). Intrusive change; should be
> included at beginning of release cycle.
>
> Others -
> DRILL-2517   - Already resolved.
> DRILL-3688/pr 382 (skip.header.line.count in hive). - Already merged. PR
> needs to be closed.
>
>
>
> Thanks
>
> Parth
>
>
>
>
>
> On Thu, Mar 3, 2016 at 3:21 PM, Jason Altekruse 
> wrote:
>
> > I have updated the PR for the parquet date corruption issue that didn't
> > make it into 1.5.
> >
> > https://github.com/apache/drill/pull/341
> > https://issues.apache.org/jira/browse/DRILL-4203
> >
> > If this can get reviewed, I think it would be good to get into the
> release.
> > Any takers?
> >
> > On Wed, Mar 2, 2016 at 11:07 PM, Parth Chandra 
> wrote:
> >
> > > I've summarized the list of JIRs below.
> > > The first set of pull requests is under review (or have some reviewer
> > > assigned).
> > > The second set contains pull requests that need review. We need
> > committers
> > > to review these. Please volunteer or these will not be able to make it
> > into
> > > the release.
> > > The third set is Jira's that do not have a patch and/or should not be
> > > included because they require deeper scrutiny.
> > > I'm hoping we can finalize the list of PRs that can be reviewed by
> Friday
> > > morning and possibly *finalize the list of issues to be included by
> > Friday
> > > end of day* so please take some time to review the PRs.
> > > Also note that the QA team has offered to do sanity testing once we
> > decide
> > > on the final commit to be included, before the release candidate is
> > rolled
> > > out, which helps with the release candidate moving forward smoothly.
> > >
> > > Here's the list -
> > >
> > > *Committed for 1.6 -*
> > > DRILL-4281/pr 400 (Drill should support inbound impersonation)
> > > DRILL-4372/pr 377(?) (Drill Operators and Functions should correctly
> > expose
> > > their types within Calcite.) - Waiting for Aman to review.
> > > DRILL-4313/pr 396  (Improved client randomization. Update JIRA with
> > > warnings about using the feature ) Sudheesh to review.
> > > DRILL-3488/pr 388 (Java 1.8 support) Hanifi to review
> > > DRILL-4437 (and others)/pr 394 (Operator unit test framework). Parth to
> > > review
> > > DRILL-4384 - Query profile is missing important information on WebUi -
> > > Marked as resolved. Patch not applied?
> > >
> > > *Need review -*
> > > DRILL-4465/pr 401 (Simplify Calcite parsing & planning integration)
> > > DRILL-4375/pr 402 (Fix the maven release profile)
> > >

Re: Apache drill on Android devices!

2016-03-03 Thread Hanifi GUNES

Interesting. Wondering if this is a fun project or one has some serious
intentions to build a Drill cluster of droids. Either way, good luck with
Dalvik and unsafe. NoSql on Android has certain advantages and use cases
but wondering how Drill as a query engine would come handy. Thoughts?


-Hanifi

2016-03-03 10:18 GMT-08:00 Jason Altekruse :

> No one has tried to do this yet. The first question I would investigate is
> if Android supports java direct memory, which Drill makes extensive use of,
> but is not considered a standard feature in all implementations of the JVM.
> A quick glance at the docs seems to indicate that it is supported (the
> allocateDirect() method is what we are interested in here) [1]. I am not
> aware of all of the differences between Android and standard Java so there
> may be other major hurdles. It looks like Android now has Java 7 support,
> which is the version that we currently develop Drill against.
>
> You can certainly give it a shot, but I doubt it will just work. Drill has
> a lot of dependencies, which I believe you would also have to re-compile.
> That could end up being quite a task itself. If you decide to try it feel
> free to ask questions here and we'll try to help out the best we can.
>
> [1] - http://developer.android.com/reference/java/nio/ByteBuffer.html
>
> - Jason
>
> On Thu, Mar 3, 2016 at 6:16 AM, Sandeep Choudhary <
> schoudh...@gofalconsmart.com> wrote:
>
> > Dear Team,
> >
> > I am looking for running the Apache Drill on Android (linux + java based)
> > devices, these devices are 4-8 cores and 2-4 GB RAM + storage speed
> around
> > to SSD speed!
> >
> > Is there any way to compile or run it?
> >
> > I think it will be great for Apache Drill too as an advantage, there are
> > many No-SQL provider started supporting this but they are not good
> enough.
> >
> > Looking for a positive response.
> >
> > Best,
> > Sandeep Choudhary
> >
>

Re: Parallelization & Threading

2016-03-01 Thread Hanifi GUNES

@Jinfeng & Aman

The proposal here is basically async execution. Pull or push are both fine.
The main requirement is to eliminate blocking calls on incoming and
outgoing buffers. In the pull model, like we discussed yesterday with some
of Drillers, NOT_YET is the only state that needs to be handled thoroughly
by operators. I do not see any state keeping besides NOT_YET in DAG. I am
not sure what additional stuff Jacques is referring to by
unwinding/rewinding states.

@Jinfeng
1: The new model is async. Regardless, it is pull or push only a single
thread works on a fragment at a time. However, thread and fragment is
decoupled in the new model. That is, the same thread could execute another
fragment on the next tick. This substantially reduces number of threads
spawned from 1000s to a fraction of number of available cores.

2: No. Each task runs through the entire fragment/operator tree.

3: Yes. Please refer to my analysis at [1] where Drill ends up having a
churn of ctx switches (~90,000/sec) for some time window(about a minute),
causing threads to starve, timing out ZooKeeper and eventually choking the
entire system.


Thanks.
-Hanifi

1: https://issues.apache.org/jira/browse/DRILL-4325


2016-03-01 11:36 GMT-08:00 Hanifi GUNES <hanifigu...@gmail.com>:

> Do you want to elaborate on and possibly walk though an example as to how
> shouldContinue(...) behaves at fragment boundaries(entry/exit) and in the
> middle considering back-pressure, inner pull loops like hash join, blocking
> semantics etc?
>
>
> Thanks.
> -Hanifi
>
>
>
> 2016-02-29 22:15 GMT-08:00 Neeraja Rentachintala <
> nrentachint...@maprtech.com>:
>
>> Jacques
>> can you provide more context on what user/customer problem these changes
>> that you & Hanifi discussed are trying to solve.
>> Is it part of the better resource utilization or concurrency/multi tenancy
>> handling or both.
>> It will help to understand that as a background for the discussion.
>>
>> -Neeraja
>>
>> On Mon, Feb 29, 2016 at 9:36 PM, Jacques Nadeau <jacq...@dremio.com>
>> wrote:
>>
>> > Hanifi and I had a great conversation late last week about how Drill
>> > currently provides parallelization. Hanifi suggested we move to a model
>> > whereby there is a fixed threadpool for all Drill work and we treat all
>> > operator and/or fragment operations as tasks that can be scheduled
>> within
>> > that pool. This would serve the following purposes:
>> >
>> > 1. reduce the number of threads that Drill creates
>> > 2. Decrease wasteful context switching (especially in high concurrency
>> > scenarios)
>> > 3. Provide more predictable slas for Drill infrastructure tasks such as
>> > heartbeats/rpc and cancellations/planning and queue management/etc (a
>> key
>> > hot-button for Vicki :)
>> >
>> > For reference, this is already the threading model we use for the RPC
>> > threads and is a fairly standard asynchronous programming model. When
>> > Hanifi and I met, we brainstormed on what types of changes might need
>> to be
>> > done and ultimately thought that in order to do this, we'd realistically
>> > want to move iterator trees from a pull model to a push model within a
>> > node.
>> >
>> > After spending more time thinking about this idea, I had the following
>> > thoughts:
>> >
>> > - We could probably accomplish the same behavior staying with a pull
>> model
>> > and using the IteraOutcome.NOT_YET to return.
>> > - In order for this to work effectively, all code would need to be
>> > non-blocking (including reading from disk, writing to socket, waiting
>> for
>> > zookeeper responses, etc)
>> > - Task length (or coarseness) would be need to be quantized
>> appropriately.
>> > While operating at the RootExec.next() might be attractive, it is too
>> > coarse to get reasonable sharing and we'd need to figure out ways to
>> have
>> > time-based exit within operators.
>> > - With this approach, one of the biggest challenges would be reworking
>> all
>> > the operators to be able to unwind the stack to exit execution (to yield
>> > their thread).
>> >
>> > Given those challenges, I think there may be another, simpler solution
>> that
>> > could cover items 2 & 3 above without dealing with all the issues that
>> we
>> > would have to deal with in the proposal that Hanifi suggested. At its
>> core,
>> > I see the biggest issue is dealing with the unwinding/rewinding that
>> would
>> > be required to

Re: Parallelization & Threading

2016-03-01 Thread Hanifi GUNES

Do you want to elaborate on and possibly walk though an example as to how
shouldContinue(...) behaves at fragment boundaries(entry/exit) and in the
middle considering back-pressure, inner pull loops like hash join, blocking
semantics etc?


Thanks.
-Hanifi



2016-02-29 22:15 GMT-08:00 Neeraja Rentachintala <
nrentachint...@maprtech.com>:

> Jacques
> can you provide more context on what user/customer problem these changes
> that you & Hanifi discussed are trying to solve.
> Is it part of the better resource utilization or concurrency/multi tenancy
> handling or both.
> It will help to understand that as a background for the discussion.
>
> -Neeraja
>
> On Mon, Feb 29, 2016 at 9:36 PM, Jacques Nadeau 
> wrote:
>
> > Hanifi and I had a great conversation late last week about how Drill
> > currently provides parallelization. Hanifi suggested we move to a model
> > whereby there is a fixed threadpool for all Drill work and we treat all
> > operator and/or fragment operations as tasks that can be scheduled within
> > that pool. This would serve the following purposes:
> >
> > 1. reduce the number of threads that Drill creates
> > 2. Decrease wasteful context switching (especially in high concurrency
> > scenarios)
> > 3. Provide more predictable slas for Drill infrastructure tasks such as
> > heartbeats/rpc and cancellations/planning and queue management/etc (a key
> > hot-button for Vicki :)
> >
> > For reference, this is already the threading model we use for the RPC
> > threads and is a fairly standard asynchronous programming model. When
> > Hanifi and I met, we brainstormed on what types of changes might need to
> be
> > done and ultimately thought that in order to do this, we'd realistically
> > want to move iterator trees from a pull model to a push model within a
> > node.
> >
> > After spending more time thinking about this idea, I had the following
> > thoughts:
> >
> > - We could probably accomplish the same behavior staying with a pull
> model
> > and using the IteraOutcome.NOT_YET to return.
> > - In order for this to work effectively, all code would need to be
> > non-blocking (including reading from disk, writing to socket, waiting for
> > zookeeper responses, etc)
> > - Task length (or coarseness) would be need to be quantized
> appropriately.
> > While operating at the RootExec.next() might be attractive, it is too
> > coarse to get reasonable sharing and we'd need to figure out ways to have
> > time-based exit within operators.
> > - With this approach, one of the biggest challenges would be reworking
> all
> > the operators to be able to unwind the stack to exit execution (to yield
> > their thread).
> >
> > Given those challenges, I think there may be another, simpler solution
> that
> > could cover items 2 & 3 above without dealing with all the issues that we
> > would have to deal with in the proposal that Hanifi suggested. At its
> core,
> > I see the biggest issue is dealing with the unwinding/rewinding that
> would
> > be required to move between threads. This is very similar to how we
> needed
> > to unwind in the case of memory allocation before we supported realloc
> and
> > causes substantial extra code complexity. As such, I suggest we use a
> pause
> > approach that uses something similar to a semaphore for the number of
> > active threads we allow. This could be done using the existing
> > shouldContinue() mechanism where we suspend or reacquire thread use as we
> > pass through this method. We'd also create some alternative
> shoudlContinue
> > methods such as shouldContinue(Lock toLock) and shouldContinue(Queue
> > queueToTakeFrom), etc so that shouldContinue would naturally wrap
> blocking
> > calls with the right logic. This would be a fairly simple set of changes
> > and we could see how well it improves issues 2 & 3 above.
> >
> > On top of this, I think we still need to implement automatic
> > parallelization scaling of the cluster. Even a rudimentary monitoring of
> > cluster load and parallel reduction of max_width_per_node would
> > substantially improve the behavior of the cluster under heavy concurrent
> > loads. (And note, I think that this is required no matter what we
> implement
> > above.)
> >
> > Thoughts?
> > Jacques
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
>

[jira] [Created] (DRILL-4416) Quote path separator for windows

2016-02-19 Thread Hanifi Gunes (JIRA)

Hanifi Gunes created DRILL-4416:
---

 Summary: Quote path separator for windows
 Key: DRILL-4416
 URL: https://issues.apache.org/jira/browse/DRILL-4416
 Project: Apache Drill
  Issue Type: Bug
Reporter: Hanifi Gunes
Assignee: Hanifi Gunes


Windows us backslash as its path separator. We need to do string manipulation 
using the separator during which the separator must be quoted. This issue 
proposes (i) creating a global static path separator variable in common and 
(ii) removing all others and (iii) using quoted separator where need be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4386) Unify and move serialization logic to common module or package

2016-02-15 Thread Hanifi Gunes (JIRA)

Hanifi Gunes created DRILL-4386:
---

 Summary: Unify and move serialization logic to common module or 
package
 Key: DRILL-4386
 URL: https://issues.apache.org/jira/browse/DRILL-4386
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Codegen
Affects Versions: 1.5.0
Reporter: Hanifi Gunes
Priority: Minor


In many places around Drill we rely on custom SerDes. However, there seems some 
redundancy. For instance, Transient/PersistentStore(introduced by DRILL-4275) 
relies on InstanceSerializer whereas Controller uses CustomSerDe interface. 
Effectively these two are the same. We need to unify use cases, possibly moving 
SerDe logic to common module or some level up in the exec module. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4381) Replace direct uses of FileSelection c'tor with create()

2016-02-09 Thread Hanifi Gunes (JIRA)

Hanifi Gunes created DRILL-4381:
---

 Summary: Replace direct uses of FileSelection c'tor with create()
 Key: DRILL-4381
 URL: https://issues.apache.org/jira/browse/DRILL-4381
 Project: Apache Drill
  Issue Type: Bug
Reporter: Hanifi Gunes
Assignee: Hanifi Gunes


We should avoid direct creation of FileSelection. This patch proposes either a 
re-design or removing instances where FileSelection c'tor is used directly. We 
also need more documentation around FileSelection abstraction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4371) Enhance scan to report file and column name on failure.

2016-02-08 Thread Hanifi Gunes (JIRA)

Hanifi Gunes created DRILL-4371:
---

 Summary: Enhance scan to report file and column name on failure.
 Key: DRILL-4371
 URL: https://issues.apache.org/jira/browse/DRILL-4371
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.5.0
Reporter: Hanifi Gunes


ScanBatch does not seem to report file and column name for some failure 
scenarios. One such case was pointed out by John on user list at this 
[thread|https://mail-archives.apache.org/mod_mbox/drill-user/201602.mbox/%3CCAKOFcwqLy%3D26LVKokm7EWizoZdYXafqH0RMXK-oYrpQkq5BELQ%40mail.gmail.com%3E].
 We should improve upon failure cases so as to provide more context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Naming the new ValueVector Initiative

2016-01-20 Thread Hanifi GUNES

Awesome! Thanks for heavy lifting Jacques.

2016-01-20 15:30 GMT-08:00 Wes McKinney :

> Fantastic!
>
> Really looking forward to working more with everyone.
>
> Thanks to Jacques for stewarding the process. This is a really important
> step for the ecosystem.
>
> On Wed, Jan 20, 2016 at 3:29 PM, Reynold Xin  wrote:
>
>> Thanks and great job driving this, Jacques!
>>
>>
>> On Wed, Jan 20, 2016 at 3:28 PM, Jacques Nadeau 
>> wrote:
>>
>>> Hey Everyone,
>>>
>>> Good news! The Apache board has approved the Apache Arrow as a new TLP.
>>> I've asked the Apache INFRA team to set up required resources so we can
>>> start moving forward (ML, Git, Website, etc).
>>>
>>> I've started working on a press release to announce the Apache Arrow
>>> project and will circulate a draft shortly. Once the project mailing lists
>>> are established, we can move this thread over there to continue
>>> discussions. They had us do one of change to the proposal during the board
>>> call which was to remove the initial committers (separate from initial
>>> pmc). Once we establish the PMC list, we can immediately add the additional
>>> committers as our first PMC action.
>>>
>>> thanks to everyone!
>>> Jacques
>>>
>>>
>>> --
>>> Jacques Nadeau
>>> CTO and Co-Founder, Dremio
>>>
>>> On Tue, Jan 12, 2016 at 11:03 PM, Julien Le Dem 
>>> wrote:
>>>
 +1 on a repo for the spec.
 I do have questions as well.
 In particular for the metadata.

 On Tue, Jan 12, 2016 at 6:59 PM, Wes McKinney  wrote:

> On Tue, Jan 12, 2016 at 6:21 PM, Parth Chandra 
> wrote:
> >
> >
> > On Tue, Jan 12, 2016 at 9:57 AM, Wes McKinney 
> wrote:
> >>
> >>
> >> >
> >> > As far as the existing work is concerned, I'm not sure everyone
> is aware
> >> > of
> >> > the C++ code inside of Drill that can represent at least the
> scalar
> >> > types in
> >> > Drill's existing Value Vectors [1]. This is currently used by the
> native
> >> > client written to hook up an ODBC driver.
> >> >
> >>
> >> I have read this code. From my perspective, it would be less work to
> >> collaborate on a self-contained implementation that closely models
> the
> >> Arrow / VV spec that includes builder classes and its own memory
> >> management without coupling to Drill details. I started prototyping
> >> something here (warning: only a few actual days of coding here):
> >>
> >> https://github.com/arrow-data/arrow-cpp/tree/master/src/arrow
> >>
> >> For example, you can see an example constructing an Array or
> >> String (== Array) column in the tests here
> >>
> >>
> >>
> https://github.com/arrow-data/arrow-cpp/blob/master/src/arrow/builder-test.cc#L328
> >>
> >> I've been planning to use this as the basis of a C++ Parquet
> >> reader-writer and the associated Python pandas-like layer which
> >> includes in-memory analytics on Arrow data structures.
> >>
> >> > Parth who is included here has been the primary owner of this C++
> code
> >> > throughout it's life in Drill. Parth, what do you think is the
> best
> >> > strategy
> >> > for managing the C++ code right now? As the C++ build is not tied
> into
> >> > the
> >> > Java one, as I understand it we just run it manually when updates
> are
> >> > made
> >> > there and we need to update ODBC. Would it be disruptive to move
> the
> >> > code to
> >> > the arrow repo? If so, we could include Drill as a submodule in
> the new
> >> > repo, or put Wes's work so far in the Drill repo.
> >>
> >> If we can enumerate the non-Drill-client related parts (i.e. the
> array
> >> accessors and data structures-oriented code) that would make sense
> in
> >> a standalone Arrow library it would be great to start a side
> >> discussion about the design of the C++ reference implementation
> >> (metadata / schemas, IPC, array builders and accessors, etc.). Since
> >> this is a quite urgent for me (intending to deliver a minimally
> viable
> >> pandas-like Arrow + Parquet in Python stack in the next ~3 months)
> it
> >> would be great to do this sooner rather than later.
> >>
> >
> > Most of the code for  Drill C++ Value Vectors is independent of
> Drill -
> > mostly the code upto line 787 in this file -
> >
> https://github.com/apache/drill/blob/master/contrib/native/client/src/include/drill/recordBatch.hpp
> >
> > My thought was to leave the Drill implementation alone and borrow
> copiously
> > from it when convenient for Arrow. Seems like we can still do that
> building
> > on Wes' work.
> >
>
> Makes sense. Speaking of code, would you all like me to set up a
>

resource manager proposal -- initial write-ups

2016-01-20 Thread Hanifi GUNES

Folks,

I have been working on designing the new resource manager. I have a *moving*
design document living at [1]. Note that this is purely a work-in-progress,
has a lot of incomplete pieces. In the meantime, however, you can get the
broad idea of what we are targeting there as well as dropping your feedback
on the side for further clarification, suggestions etc.


Cheers.
-Hanifi

1: https://goo.gl/rpcjVR

Re: [DISCUSS] DRILL-4132

2016-01-19 Thread Hanifi Gunes

Do you want to create a doodle for this? [1]

-Hanifi

1: http://doodle.com/create

On Mon, Jan 18, 2016 at 11:02 PM, yuliya Feldman <
yufeld...@yahoo.com.invalid> wrote:

>
> Hello here,
> I wanted to start discussion on [1]
> Would be nice to have a hangout session with @jacques-n,
> @hnfgns, @StevenMPhillips
> Let me know suitable time
> Thanks,Yuliya
> [1] https://issues.apache.org/jira/browse/DRILL-4132

Re: query hanging in CANCELLATION_REQUEST

2016-01-19 Thread Hanifi Gunes

I had reported this problem sometime last year verbally. I don't remember
creating a JIRA though. In general, I dislike this sort of blocking calls
anywhere in the execution even though one could argue it simplifies the
code flow.

A JIRA would be appreciated.

On Tue, Jan 19, 2016 at 11:10 AM, Abdel Hakim Deneche  wrote:

> I was running a query with a hash join that was generating lot's of
> results. I cancelled the query from sqlline then closed it. Now the query
> is stuck in CANCELLATION_REQUEST state.
>
> Looking at jstack it looks like screenRoot is blocked waiting for data sent
> to the client to be acknowledged.
>
> Do we have a JIRA similar to this ?
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> >
>

[jira] [Created] (DRILL-4283) Experimental union type support is broken around ComplexWriterImpl

2016-01-19 Thread Hanifi Gunes (JIRA)

Hanifi Gunes created DRILL-4283:
---

 Summary: Experimental union type support is broken around 
ComplexWriterImpl
 Key: DRILL-4283
 URL: https://issues.apache.org/jira/browse/DRILL-4283
 Project: Apache Drill
  Issue Type: Bug
Reporter: Hanifi Gunes
Priority: Critical


VectorAccessibleComplexWriter#getWriter does not pass along the union flag 
while creating ComplexWriterImpl. This c'tor assumes union type is disabled.

To reproduce: i) enable union type ii) run a query over data with changing 
schema that uses ComplexWriterImpl like convert_from(..., 'JSON'). Query, then, 
should fail as though union type is not enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4186) queue enhancements [umbrella]

2015-12-11 Thread Hanifi Gunes (JIRA)

Hanifi Gunes created DRILL-4186:
---

 Summary: queue enhancements [umbrella]
 Key: DRILL-4186
 URL: https://issues.apache.org/jira/browse/DRILL-4186
 Project: Apache Drill
  Issue Type: Bug
Reporter: Hanifi Gunes
Assignee: Hanifi Gunes


Umbrella for queue enhancements



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4187) Introduce a state to separate queries pending execution from those pending in the queue.

2015-12-11 Thread Hanifi Gunes (JIRA)

Hanifi Gunes created DRILL-4187:
---

 Summary: Introduce a state to separate queries pending execution 
from those pending in the queue.
 Key: DRILL-4187
 URL: https://issues.apache.org/jira/browse/DRILL-4187
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Hanifi Gunes


Currently queries pending in the queue are not listed in the web UI besides we 
use the state PENDING to mean pending executions. This issue proposes i) to 
list enqueued queries in the web UI ii) to introduce a new state for queries 
sitting at the queue, differentiating then from those pending execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4180) IllegalArgumentException while reading JSON files

2015-12-09 Thread Hanifi Gunes (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanifi Gunes resolved DRILL-4180.
-
Resolution: Fixed

Fixed by 539cbba 

> IllegalArgumentException while reading JSON files
> -
>
> Key: DRILL-4180
> URL: https://issues.apache.org/jira/browse/DRILL-4180
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.5.0
>
> Attachments: a.json, b.json
>
>
> First of all, this issue can be reproduced when drill runs on distributed 
> mode.
> We have two json files in distributed file system. The type for the column is 
> MAP and there is not schema change on the top level. However, in the one 
> layer deeper in this column, the first file has one NullableBit column, which 
> does not appear in the second file. 
> The issue can be reproduced by the files in the attachment and this query :
> {code}
> select jsonFieldMapLevel1_aaa from directory
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Announcing new committer: Kristine Hahn

2015-12-04 Thread Hanifi Gunes

Congrats & welcome Kristine!

On Fri, Dec 4, 2015 at 10:51 AM, Khurram Faraaz 
wrote:

> Congratulations, Kris!
>
> On Fri, Dec 4, 2015 at 10:43 AM, Jinfeng Ni  wrote:
>
> > Congratulations, Kristine!
> >
> >
> >
> >
> > On Fri, Dec 4, 2015 at 10:33 AM, Bridget Bevens 
> > wrote:
> > > Congratulations, Kris!
> > >
> > > On Fri, Dec 4, 2015 at 10:09 AM, Zelaine Fong 
> > wrote:
> > >
> > >> Congrats, Kristine!
> > >>
> > >> On Fri, Dec 4, 2015 at 9:56 AM, Bob Rumsby 
> > wrote:
> > >>
> > >> > Congratulations Kris!
> > >> >
> > >> > On Fri, Dec 4, 2015 at 9:46 AM, Jason Altekruse <
> > >> altekruseja...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Congrats Kris! Well deserved, the docs are looking great!
> > >> > >
> > >> > > On Fri, Dec 4, 2015 at 9:36 AM, Sudheesh Katkam <
> > skat...@maprtech.com>
> > >> > > wrote:
> > >> > >
> > >> > > > Congratulations and welcome, Kris!
> > >> > > >
> > >> > > > > On Dec 4, 2015, at 9:19 AM, Jacques Nadeau <
> jacq...@dremio.com>
> > >> > wrote:
> > >> > > > >
> > >> > > > > The Apache Drill PMC is very pleased to announce Kristine Hahn
> > as a
> > >> > new
> > >> > > > > committer.
> > >> > > > >
> > >> > > > > Kris has worked tirelessly on creating and improving the Drill
> > >> > > > > documentation. She has been extraordinary in her engagement
> with
> > >> the
> > >> > > > > community and has greatly accelerated the speed to resolution
> of
> > >> doc
> > >> > > > issues
> > >> > > > > and improvements.
> > >> > > > >
> > >> > > > > Welcome Kristine!
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
>

[discuss] Enhance queue support to take query cost & available cluster resources into account

2015-11-25 Thread Hanifi GUNES

Hey devs,

I have created an issue [1] to enhance queue support to take query cost &
available cluster resources into account. Please take your time to comment
under the issue if you have any feedback.

Thanks.
-Hanifi

1: https://issues.apache.org/jira/browse/DRILL-4136

[jira] [Created] (DRILL-4136) Enhance queue support to take query cost & available cluster resources into account

2015-11-25 Thread Hanifi Gunes (JIRA)

Hanifi Gunes created DRILL-4136:
---

 Summary: Enhance queue support to take query cost & available 
cluster resources into account
 Key: DRILL-4136
 URL: https://issues.apache.org/jira/browse/DRILL-4136
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Flow
Affects Versions: 1.3.0
Reporter: Hanifi Gunes
Assignee: Hanifi Gunes


Current queue support relies on a distributed semaphore around a fix 
pre-defined number. This semaphore indicates the number of queries Drill can 
run concurrently. Presently, we define small and large queues where we classify 
queries based on a threshold and use two semaphores around small and large 
queues individually. 

This issue proposes to come up with an enhanced queueing or query dispatch 
mechanism where a query is granted execution based on its cost and availability 
of system resources(cpu, io, memory etc). Enhancing cost planing and 
introducing a distributed resource management should be addressed later to 
fully benefit from this enhancement. The proposal is a non-blocking and 
asynchronous mechanism that assumes eventual consistency around available 
system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Announcing new committer: Ellen Friedman

2015-11-23 Thread Hanifi GUNES

Congrats Ellen!

2015-11-23 8:56 GMT-08:00 Hsuan Yi Chu :

> Congrats!
>
> On Mon, Nov 23, 2015 at 8:50 AM, Sudheesh Katkam 
> wrote:
>
> > Congratulations and welcome, Ellen!
> >
> > > On Nov 23, 2015, at 12:32 AM, Julian Hyde  wrote:
> > >
> > > Congratulations, Ellen! Thanks for all you’ve done for Drill so far.
> > >
> > > Julian
> > >
> > >
> > >> On Nov 22, 2015, at 5:50 PM, Worthy LaFollette 
> > wrote:
> > >>
> > >> Congrats, Welcome!
> > >>
> > >> On Sun, Nov 22, 2015 at 6:38 PM, Jacques Nadeau 
> > wrote:
> > >>
> > >>> The Apache Drill PMC is very pleased to announce Ellen Friedman as a
> > new
> > >>> committer.
> > >>>
> > >>> Ellen has spent countless hours supporting Drill in a variety of ways
> > >>> including speaking at conferences, helping promote Drill via Twitter
> > and
> > >>> setting up and running various meetups. She has been a friend of
> Apache
> > >>> Drill since day one and we welcome her in her new capacity as a
> > committer.
> > >>>
> > >>> Welcome Ellen!
> > >>>
> > >
> >
> >
>

Re: Moving directory based pruning to fire earlier

2015-11-23 Thread Hanifi GUNES

The general idea of multi-phase pruning makes sense to me. I am wondering,
though, are we referring to introducing a new planning phase before the
logical or separating out the logic so as to make directory pruning kick
off ahead of column partitioning?

2015-11-23 10:33 GMT-08:00 Mehant Baid :

> As part of DRILL-3996 
> Jinfeng mentioned that he plans to move the directory based pruning rule
> earlier than column based pruning. I want to expand on that a little,
> provide the motivation and gather thoughts/ feedback.
>
> Currently both the directory based pruning and the column based pruning is
> fired in the same planning phase and are based on Drill logical rels. This
> is not optimal in the case where data is organized in such a way that both
> directory based pruning and column based pruning can be applied (when the
> data is organized with a nested directory structure plus the individual
> files contain partition columns). As part of creating the Drill logical
> scan we read the footers of all the files involved. If the directory based
> pruning rule is fired earlier (rule to fire based on calcite logical rels)
> then we will be able to prune out unnecessary directories and save the work
> of reading the footers of these files.
>
> Thanks
> Mehant
>
>

[jira] [Created] (DRILL-4123) DirectoryExplorers should refer to fully qualified variable names

2015-11-23 Thread Hanifi Gunes (JIRA)

Hanifi Gunes created DRILL-4123:
---

 Summary: DirectoryExplorers should refer to fully qualified 
variable names
 Key: DRILL-4123
 URL: https://issues.apache.org/jira/browse/DRILL-4123
 Project: Apache Drill
  Issue Type: Bug
Reporter: Hanifi Gunes
Assignee: Hanifi Gunes


Execution fails with {code}CompileException: Line 75, Column 70: Unknown 
variable or type "FILE_SEPARATOR"{code} in case a directory explorer is used in 
a projection. Also FILE_SEPARATOR should not be platform dependent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4117) Ensure proper null outcome handling during FileSelection creation and its subclasses.

2015-11-20 Thread Hanifi Gunes (JIRA)

Hanifi Gunes created DRILL-4117:
---

 Summary: Ensure proper null outcome handling during FileSelection 
creation and its subclasses.
 Key: DRILL-4117
 URL: https://issues.apache.org/jira/browse/DRILL-4117
 Project: Apache Drill
  Issue Type: Bug
Reporter: Hanifi Gunes


Hakim identified the following does not make a null check upon the result of  
FileSelection.create(...). This issue is to ensure proper null outcome handling 
during FileSelection creation and its subclasses or to return a non-null 
default type.

{quote}
onFileSystemPartitionDescriptor.createNewGroupScan() passes the output to 
FileGroupScan.close() which expects it to be not null
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: ExternalSort doesn't properly account for sliced buffers

2015-11-20 Thread Hanifi GUNES

Problem with the above code is that not all vectors operate on root buffers
rendering the accounting above inaccurate. In fact your example is one
perfect instance where vectors would point to non-root buffers for sure
because of the slicing taking place at RecordBatchLoader#load [1]


1:
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchLoader.java#L117

2015-11-20 11:41 GMT-08:00 Steven Phillips :

> I think it is because we can't actually properly account for sliced
> buffers. I don't remember for sure, but I think it might be because calling
> buf.capacity() on a sliced buffer returns the the capacity of root buffer,
> not the size of the slice. That may not be correct, but I think it was
> something like that. Whatever it is, I am pretty sure it was giving wrong
> results when they are sliced buffers.
>
> I think we need to get the new allocator, along with proper transfer of
> ownership in order to do this correctly. Then we can just query the
> allocator rather than trying to track it separately.
>
> On Fri, Nov 20, 2015 at 11:25 AM, Abdel Hakim Deneche <
> adene...@maprtech.com
> > wrote:
>
> > I'm looking at the external sort code and it uses the following method to
> > compute the allocated size of a batch:
> >
> >   private long getBufferSize(VectorAccessible batch) {
> > > long size = 0;
> > > for (VectorWrapper w : batch) {
> > >   DrillBuf[] bufs = w.getValueVector().getBuffers(false);
> > >   for (DrillBuf buf : bufs) {
> > > if (*buf.isRootBuffer()*) {
> > >   size += buf.capacity();
> > > }
> > >   }
> > > }
> > > return size;
> > >   }
> >
> >
> > This method only accounts for root buffers, but when we have a receiver
> > below the sort, most of (if not all) buffers are child buffers. This may
> > delay spilling, and increase the memory usage of the drillbit. If my
> > computations are correct, for a single query, one drillbit can allocate
> up
> > to 40GB without spilling once to disk.
> >
> > Is there a specific reason we only account for root buffers ?
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> > >
> >
>

Re: ExternalSort doesn't properly account for sliced buffers

2015-11-20 Thread Hanifi GUNES

Seems like I am missing some input here. Are we talking about changing the
behavior of DrillBuf#capacity()? If so, I would +1 the following:

*- It seems like capacity should return the length of the slice (since
I believe that fits with the general ByteBuf interface). If I
reset writerIndex(0), I should be able write to capacity() without issue.*

2015-11-20 16:03 GMT-08:00 Jacques Nadeau <jacq...@dremio.com>:

> My gut is that any changes other than correct accounting will have little
> impact on real world situations. Do you think the change will make enough
> difference to be valuable?
>
> It seems like capacity should return the length of the slice (since I
> believe that fits with the general ByteBuf interface). If I reset
> writerIndex(0), I should be able write to capacity() without issue. Seems
> weird to return some other (mostly disconnect) value.
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Fri, Nov 20, 2015 at 3:28 PM, Abdel Hakim Deneche <
> adene...@maprtech.com>
> wrote:
>
> > @Steven, I think DrillBuf.capacity() was changed at some point, I was
> > looking at the code and it seems to only return the size of the "slice"
> and
> > not the root buffer.
> >
> > While waiting for the new allocator and transfer of ownership, would it
> > make sense to remove the check for root buffers like this ?
> >
> >   private long getBufferSize(VectorAccessible batch) {
> > > long size = 0;
> > > for (VectorWrapper w : batch) {
> > >   DrillBuf[] bufs = w.getValueVector().getBuffers(false);
> > >   for (DrillBuf buf : bufs) {
> > > size += buf.capacity();
> > >   }
> > > }
> > > return size;
> > >   }
> >
> >
> >
> > On Fri, Nov 20, 2015 at 2:50 PM, Hanifi GUNES <hanifigu...@gmail.com>
> > wrote:
> >
> > > Problem with the above code is that not all vectors operate on root
> > buffers
> > > rendering the accounting above inaccurate. In fact your example is one
> > > perfect instance where vectors would point to non-root buffers for sure
> > > because of the slicing taking place at RecordBatchLoader#load [1]
> > >
> > >
> > > 1:
> > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchLoader.java#L117
> > >
> > > 2015-11-20 11:41 GMT-08:00 Steven Phillips <ste...@dremio.com>:
> > >
> > > > I think it is because we can't actually properly account for sliced
> > > > buffers. I don't remember for sure, but I think it might be because
> > > calling
> > > > buf.capacity() on a sliced buffer returns the the capacity of root
> > > buffer,
> > > > not the size of the slice. That may not be correct, but I think it
> was
> > > > something like that. Whatever it is, I am pretty sure it was giving
> > wrong
> > > > results when they are sliced buffers.
> > > >
> > > > I think we need to get the new allocator, along with proper transfer
> of
> > > > ownership in order to do this correctly. Then we can just query the
> > > > allocator rather than trying to track it separately.
> > > >
> > > > On Fri, Nov 20, 2015 at 11:25 AM, Abdel Hakim Deneche <
> > > > adene...@maprtech.com
> > > > > wrote:
> > > >
> > > > > I'm looking at the external sort code and it uses the following
> > method
> > > to
> > > > > compute the allocated size of a batch:
> > > > >
> > > > >   private long getBufferSize(VectorAccessible batch) {
> > > > > > long size = 0;
> > > > > > for (VectorWrapper w : batch) {
> > > > > >   DrillBuf[] bufs = w.getValueVector().getBuffers(false);
> > > > > >   for (DrillBuf buf : bufs) {
> > > > > > if (*buf.isRootBuffer()*) {
> > > > > >   size += buf.capacity();
> > > > > > }
> > > > > >   }
> > > > > > }
> > > > > > return size;
> > > > > >   }
> > > > >
> > > > >
> > > > > This method only accounts for root buffers, but when we have a
> > receiver
> > > > > below the sort, most of (if not all) buffers are child buffers.
> This
> > > may
> > > > > delay spilling, and increase the memory usage of the drillbit. If
> my
> > > > > computations are correct, for a single query, one drillbit can
> > allocate
> > > > up
> > > > > to 40GB without spilling once to disk.
> > > > >
> > > > > Is there a specific reason we only account for root buffers ?
> > > > >
> > > > > --
> > > > >
> > > > > Abdelhakim Deneche
> > > > >
> > > > > Software Engineer
> > > > >
> > > > >   <http://www.mapr.com/>
> > > > >
> > > > >
> > > > > Now Available - Free Hadoop On-Demand Training
> > > > > <
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> > >
> >
>

Re: Unusually Long Build Time

2015-11-16 Thread Hanifi Gunes

[INFO] Reactor Summary:
[INFO]
[INFO] Apache Drill Root POM .. SUCCESS [ 13.174 s]
[INFO] tools/Parent Pom ... SUCCESS [  0.769 s]
[INFO] tools/freemarker codegen tooling ... SUCCESS [  7.586 s]
[INFO] Drill Protocol . SUCCESS [  7.642 s]
[INFO] Common (Logical Plan, Base expressions)  SUCCESS [  6.029 s]
[INFO] Logical Plan, Base expressions . SUCCESS [  9.719 s]
[INFO] exec/Parent Pom  SUCCESS [  0.936 s]
[INFO] exec/memory/Parent Pom . SUCCESS [  0.579 s]
[INFO] exec/memory/base ... SUCCESS [  2.332 s]
[INFO] exec/memory/impl ... SUCCESS [  3.431 s]
[INFO] exec/rpc ... SUCCESS [  2.968 s]
[INFO] exec/Vectors ... SUCCESS [01:10 min]
[INFO] contrib/Parent Pom . SUCCESS [  0.929 s]
[INFO] contrib/data/Parent Pom  SUCCESS [  0.669 s]
[INFO] contrib/data/tpch-sample-data .. SUCCESS [  4.139 s]
[INFO] exec/Java Execution Engine . SUCCESS [ 49.838 s]
[INFO] exec/JDBC Driver using dependencies  SUCCESS [ 13.623 s]
[INFO] JDBC JAR with all dependencies . SUCCESS [ 29.194 s]
[INFO] contrib/mongo-storage-plugin ... SUCCESS [  6.740 s]
[INFO] contrib/hbase-storage-plugin ... SUCCESS [ 12.197 s]
[INFO] contrib/jdbc-storage-plugin  SUCCESS [ 33.667 s]
[INFO] contrib/hive-storage-plugin/Parent Pom . SUCCESS [  0.833 s]
[INFO] contrib/hive-storage-plugin/hive-exec-shaded ... SUCCESS [ 18.817 s]
[INFO] contrib/hive-storage-plugin/core ... SUCCESS [ 10.109 s]
[INFO] contrib/drill-gis-plugin ... SUCCESS [  4.866 s]
[INFO] Packaging and Distribution Assembly  SUCCESS [ 39.002 s]
[INFO] contrib/sqlline  SUCCESS [  5.731 s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 05:57 min


Here is my latest build @cd01107. Looks like all of your modules are
order of magnitudes slower.


On Mon, Nov 16, 2015 at 10:24 AM, Hanifi GUNES <hanifigu...@gmail.com>
wrote:

> Hey, let's not be a pessimist. It is all success =p Joke aside, I have been
> building on multiple platforms, will report my timings but how does your
> baseline/usual runtime look like?
>
> 2015-11-16 10:18 GMT-08:00 Sudheesh Katkam <skat...@maprtech.com>:
>
> > Hey y’all,
> >
> > On master (cd01107), building (mvn clean install -DskipTests) takes
> > unusually long on my Mac. Looks like every module is taking longer. This
> > happens on multiple runs.
> >
> > ➜  drill git:(master) export -p
> > typeset -x
> > JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_75.jdk/Contents/Home
> > ...
> > typeset -x MAVEN_OPTS='-Xmx1g -XX:MaxPermSize=256m’
> > ...
> > typeset -x
> >
> PATH=/Users/skatkam/google-cloud-sdk/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
> > typeset -x PWD=/Users/skatkam/Documents/drill
> > ...
> >
> > ➜  drill git:(master) mvn -version
> > Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1;
> > 2014-12-14T09:29:23-08:00)
> > Maven home: /usr/local/Cellar/maven/3.2.5/libexec
> > Java version: 1.7.0_75, vendor: Oracle Corporation
> > Java home:
> > /Library/Java/JavaVirtualMachines/jdk1.7.0_75.jdk/Contents/Home/jre
> > Default locale: en_US, platform encoding: UTF-8
> > OS name: "mac os x", version: "10.11.1", arch: "x86_64", family: "mac"
> >
> > ➜  drill git:(master) mvn clean install -DskipTests
> > ...
> > [INFO]
> > 
> > [INFO] Reactor Summary:
> > [INFO]
> > [INFO] Apache Drill Root POM .. SUCCESS
> [01:49
> > min]
> > [INFO] tools/Parent Pom ... SUCCESS [
> > 43.263 s]
> > [INFO] tools/freemarker codegen tooling ... SUCCESS [
> > 47.562 s]
> > [INFO] Drill Protocol . SUCCESS
> [01:15
> > min]
> > [INFO] Common (Logical Plan, Base expressions)  SUCCESS
> [01:05
> > min]
> > [INFO] Logical Plan, Base expressions . SUCCESS
> [01:07
> > min]
> > [INFO] e

Re: Unusually Long Build Time

2015-11-16 Thread Hanifi GUNES

Hey, let's not be a pessimist. It is all success =p Joke aside, I have been
building on multiple platforms, will report my timings but how does your
baseline/usual runtime look like?

2015-11-16 10:18 GMT-08:00 Sudheesh Katkam :

> Hey y’all,
>
> On master (cd01107), building (mvn clean install -DskipTests) takes
> unusually long on my Mac. Looks like every module is taking longer. This
> happens on multiple runs.
>
> ➜  drill git:(master) export -p
> typeset -x
> JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_75.jdk/Contents/Home
> ...
> typeset -x MAVEN_OPTS='-Xmx1g -XX:MaxPermSize=256m’
> ...
> typeset -x
> PATH=/Users/skatkam/google-cloud-sdk/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
> typeset -x PWD=/Users/skatkam/Documents/drill
> ...
>
> ➜  drill git:(master) mvn -version
> Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1;
> 2014-12-14T09:29:23-08:00)
> Maven home: /usr/local/Cellar/maven/3.2.5/libexec
> Java version: 1.7.0_75, vendor: Oracle Corporation
> Java home:
> /Library/Java/JavaVirtualMachines/jdk1.7.0_75.jdk/Contents/Home/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "mac os x", version: "10.11.1", arch: "x86_64", family: "mac"
>
> ➜  drill git:(master) mvn clean install -DskipTests
> ...
> [INFO]
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Apache Drill Root POM .. SUCCESS [01:49
> min]
> [INFO] tools/Parent Pom ... SUCCESS [
> 43.263 s]
> [INFO] tools/freemarker codegen tooling ... SUCCESS [
> 47.562 s]
> [INFO] Drill Protocol . SUCCESS [01:15
> min]
> [INFO] Common (Logical Plan, Base expressions)  SUCCESS [01:05
> min]
> [INFO] Logical Plan, Base expressions . SUCCESS [01:07
> min]
> [INFO] exec/Parent Pom  SUCCESS [
> 40.882 s]
> [INFO] exec/memory/Parent Pom . SUCCESS [
> 37.143 s]
> [INFO] exec/memory/base ... SUCCESS [
> 46.316 s]
> [INFO] exec/memory/impl ... SUCCESS [01:05
> min]
> [INFO] exec/rpc ... SUCCESS [01:21
> min]
> [INFO] exec/Vectors ... SUCCESS [02:16
> min]
> [INFO] contrib/Parent Pom . SUCCESS [
> 30.066 s]
> [INFO] contrib/data/Parent Pom  SUCCESS [
> 25.249 s]
> [INFO] contrib/data/tpch-sample-data .. SUCCESS [
> 37.469 s]
> [INFO] exec/Java Execution Engine . SUCCESS [05:56
> min]
> [INFO] exec/JDBC Driver using dependencies  SUCCESS [01:35
> min]
> [INFO] JDBC JAR with all dependencies . SUCCESS [01:30
> min]
> [INFO] contrib/mongo-storage-plugin ... SUCCESS [01:06
> min]
> [INFO] contrib/hbase-storage-plugin ... SUCCESS [01:08
> min]
> [INFO] contrib/jdbc-storage-plugin  SUCCESS [01:41
> min]
> [INFO] contrib/hive-storage-plugin/Parent Pom . SUCCESS [
> 22.398 s]
> [INFO] contrib/hive-storage-plugin/hive-exec-shaded ... SUCCESS [01:04
> min]
> [INFO] contrib/hive-storage-plugin/core ... SUCCESS [01:37
> min]
> [INFO] contrib/drill-gis-plugin ... SUCCESS [
> 42.391 s]
> [INFO] Packaging and Distribution Assembly  SUCCESS [
> 49.298 s]
> [INFO] contrib/sqlline  SUCCESS [
> 22.623 s]
> [INFO]
> 
> [INFO] BUILD SUCCESS
> [INFO]
> 
> [INFO] Total time: 33:11 min
> [INFO] Finished at: 2015-11-16T09:48:45-08:00
> [INFO] Final Memory: 141M/593M
> [INFO]
> 
> [INFO] #stop(53981): in 91.00µs
>
> Thank you,
> Sudheesh

[jira] [Resolved] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]

2015-11-10 Thread Hanifi Gunes (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanifi Gunes resolved DRILL-2288.
-
Resolution: Fixed

Fixed by a0be3ae0a5a69634be98cc517bcc31c11ffec91d

> ScanBatch violates IterOutcome protocol for zero-row sources [was: missing 
> JDBC metadata (schema) for 0-row results...]
> ---
>
> Key: DRILL-2288
> URL: https://issues.apache.org/jira/browse/DRILL-2288
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Information Schema
>Reporter: Daniel Barclay (Drill)
>Assignee: Daniel Barclay (Drill)
> Fix For: 1.4.0
>
> Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java
>
>
> The ResultSetMetaData object from getMetadata() of a ResultSet is not set up 
> (getColumnCount() returns zero, and trying to access any other metadata 
> throws IndexOutOfBoundsException) for a result set with zero rows, at least 
> for one from DatabaseMetaData.getColumns(...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] Release Apache Drill 1.3.0 (rc1)

2015-11-09 Thread Hanifi Gunes

-1 (binding)

5/5 consecutive runs I get the following consistently with forkCount set to
1 on build.

Tests in error:
  TestImpersonationQueries.sequenceFileChainedImpersonationWithView » UserRemote
  
TestImpersonationQueries.testMultiLevelImpersonationJoinEachSideReachesMaxUserHops:233->BaseTestQuery.updateClient:222->BaseTestQuery.
   updateClient:236->BaseTestQuery.updateClient:213 » IllegalState
  
TestImpersonationQueries.testMultiLevelImpersonationExceedsMaxUserHops:219->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:
  236->BaseTestQuery.updateClient:213 » IllegalState
  
TestImpersonationQueries.avroChainedImpersonationWithView:280->BaseTestImpersonation.createView:186->BaseTestQuery.updateClient:222-
 >BaseTestQuery.updateClient:236->BaseTestQuery.updateClient:213 »
IllegalState
  
TestImpersonationQueries.testDirectImpersonation_HasGroupReadPermissions:186->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:
236->BaseTestQuery.updateClient:213 » IllegalState
  
TestImpersonationQueries.testDirectImpersonation_NoReadPermissions:196->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:236-
  >BaseTestQuery.updateClient:213 » IllegalState
  
TestImpersonationQueries.testMultiLevelImpersonationEqualToMaxUserHops:210->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:
  236->BaseTestQuery.updateClient:213 » IllegalState


On Mon, Nov 9, 2015 at 2:19 PM, Stefán Baxter 
wrote:

> Hi,
>
> I'm not a committer and not voting but I want to point out that the UDFs
> that we are using and run with 1.1 and 1.2 do not, for some reason, run
> with 1.3.
>
> I'm inclined to insist that it's something we are doing wrong but if this
> counts as a test then it's failing.
>
> Regards,
>  -Stefan
>
> On Mon, Nov 9, 2015 at 10:09 PM, Norris Lee  wrote:
>
> > +1 Non-binding.
> >
> > Built from source on Linux, tested ODBC against various data sources
> > (Hive, DFS-csv, tsv, parquet, json)
> >
> > Norris
> >
> > -Original Message-
> > From: Abdel Hakim Deneche [mailto:adene...@maprtech.com]
> > Sent: Monday, November 09, 2015 10:16 AM
> > To: dev@drill.apache.org
> > Subject: Re: [VOTE] Release Apache Drill 1.3.0 (rc1)
> >
> > +1
> >
> > Checked checksum files and gpg signature for all 3 files. Built from
> > source, deployed on a 4 nodes cluster then run several window function
> > queries on TPCDS SF100. Looks great.
> >
> > On Mon, Nov 9, 2015 at 8:14 AM, Jacques Nadeau 
> wrote:
> >
> > > I've seen another instance of DRILL-4041. Since it is exceedingly
> > > rare, I don't think it is a release stopper. I will spend some more
> > > time reviewing it.
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Sun, Nov 8, 2015 at 7:34 PM, Amit Hadke 
> wrote:
> > >
> > > > +1, non-binding
> > > > Downloaded tarball, mvn clean install Ran join queries,, was able to
> > > > run queries on json, parquet and sequence files.
> > > >
> > > > On Sun, Nov 8, 2015 at 10:01 AM, Abhijit Pol 
> > wrote:
> > > >
> > > > > resending the vote with details
> > > > >
> > > > > +1, non-binding
> > > > >
> > > > > ran maven tests, installed on Mac, ran drill in embedded mode,
> > > > > verified
> > > > Web
> > > > > UI for CSV header config changes, ran few queries on CSV files
> > > > > with and without header, ran few queries against large CSV files
> > > > > with header in
> > > > HDFS
> > > > >
> > > > > On Sun, Nov 8, 2015 at 9:07 AM, Hsuan Yi Chu 
> > > > wrote:
> > > > >
> > > > > > +1
> > > > > > mvn clean install on mac and linux
> > > > > >
> > > > > > tried a few queries on distributed mode.
> > > > > >
> > > > > > Things looked fine
> > > > > >
> > > > > > On Sun, Nov 8, 2015 at 8:16 AM, Aman Sinha 
> > > > wrote:
> > > > > >
> > > > > > > +1.
> > > > > > > Downloaded the binary tar ball and installed on Mac.  Ran
> > > > > > > Drill in
> > > > > > embedded
> > > > > > > mode.  Tested a few join, aggregation and limit queries
> > > > > > > against
> > > > parquet
> > > > > > > data with and without metadata cache.  Tested cancellation for
> > > couple
> > > > > of
> > > > > > > queries.  Looked at query profiles on the WebUI.   Looks good.
> > > > > > >
> > > > > > > On Sat, Nov 7, 2015 at 11:53 PM, Abhijit Pol
> > > > > > > 
> > > > > wrote:
> > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > > > On Sat, Nov 7, 2015 at 9:41 PM, andrew 
> > > wrote:
> > > > > > > >
> > > > > > > > > +1 (non-binding, non-comitter)
> > > > > > > > >
> > > > > > > > > > On Nov 6, 2015, at 10:15 PM, Jacques Nadeau <
> > > > jacq...@dremio.com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hey Everybody,
> > > > > > > > > >
> > > > > > > > > > I'd like to propose a new release candidate of Apache
> > > > > > > > > > Drill,
> > > > > > version
> > > > > > > > > >

Re: Announcing new committer: Sudheesh Katkam

2015-11-09 Thread Hanifi Gunes

Congrats!

On Mon, Nov 9, 2015 at 10:33 AM, Raghu Devata  wrote:

> Congrats Sudheesh !! :)
>
>
> Regards,
> Raghu Devata.
>
> On Mon, Nov 9, 2015 at 1:26 PM, AnilKumar B  wrote:
>
> > Congrats Sudheesh.
> >
> > Thanks & Regards,
> > B Anil Kumar.
> >
> > On Mon, Nov 9, 2015 at 10:33 AM, Hsuan Yi Chu 
> wrote:
> >
> > > Congrats Sudheesh!
> > >
> > > On Sun, Nov 8, 2015 at 7:10 PM, Amit Hadke 
> wrote:
> > >
> > > > Congrats Sudheesh!
> > > >
> > > > On Sun, Nov 8, 2015 at 6:27 PM, Abdel Hakim Deneche <
> > > adene...@maprtech.com
> > > > >
> > > > wrote:
> > > >
> > > > > Congrats Sudheesh!
> > > > >
> > > > > On Sun, Nov 8, 2015 at 5:16 PM, Abhishek Girish <
> > > > abhishek.gir...@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Congrats Sudheesh!
> > > > > >
> > > > > > On Sunday, November 8, 2015, Khurram Faraaz <
> kfar...@maprtech.com>
> > > > > wrote:
> > > > > >
> > > > > > > Congrats Sudheesh!
> > > > > > >
> > > > > > > On Sun, Nov 8, 2015 at 3:13 PM, Jacques Nadeau <
> > jacq...@dremio.com
> > > > > > > > wrote:
> > > > > > >
> > > > > > > > The Apache Drill PMC is very proud to announce Sudheesh
> Katkam
> > > as a
> > > > > > > > new committer.
> > > > > > > >
> > > > > > > > Sudheesh has done a great deal of work on Drill and it is an
> > > honor
> > > > to
> > > > > > now
> > > > > > > > have him as a committer. We're lucky to have him as part of
> our
> > > > > > > community.
> > > > > > > >
> > > > > > > > Welcome Sudheesh!
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Abdelhakim Deneche
> > > > >
> > > > > Software Engineer
> > > > >
> > > > >   
> > > > >
> > > > >
> > > > > Now Available - Free Hadoop On-Demand Training
> > > > > <
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Cheers!
> Raghu D
>

Re: [VOTE] Release Apache Drill 1.3.0 (rc1)

2015-11-09 Thread Hanifi Gunes

Using the default parallelism, I get the same set of exceptions on CentOS.

Also max # of threads per process is the system default, 319548.

On Mon, Nov 9, 2015 at 4:13 PM, Amit Hadke <amit.ha...@gmail.com> wrote:

> Unfortunately, yes I changed them at OS level.
>
> I agree that making OS level changes to run tests is not something we
> should advocate.
> It was just an observation that tests seem to pass on linux with default
> settings but on mac behavior is irregular.
>
> ~ Amit.
>
>
> On Mon, Nov 9, 2015 at 3:48 PM, Jason Altekruse <altekruseja...@gmail.com>
> wrote:
>
> > Amit,
> >
> > Did you change these at the OS level? I would hope that we wouldn't have
> to
> > make people jump through such hoops to run the build, was there a thread
> I
> > missed where we discussed this something that we should be expecting
> > everyone to do? I did see the threads about increasing the maven permgen
> > and did so accordingly, but I think adjusting something like the system
> > ulimit is something we really should try to avoid.
> >
> > On Mon, Nov 9, 2015 at 3:44 PM, Abdel Hakim Deneche <
> adene...@maprtech.com
> > >
> > wrote:
> >
> > > I just tried with the default forkCount and got similar failures. I'm
> > > running the tests on a linux machine with the default maven options
> > >
> > > Thanks
> > >
> > > On Mon, Nov 9, 2015 at 3:21 PM, Jacques Nadeau <jacq...@dremio.com>
> > wrote:
> > >
> > > > So you're changing the forkCount from the default? What happens when
> > you
> > > > run with the default?
> > > >
> > > > Also, did you guys run with extra permgen or the default? Also, can
> you
> > > > confirm what type of machine you are running on?
> > > >
> > > > Want to figure out why things are different.
> > > >
> > > > thanks,
> > > > Jacques
> > > >
> > > > --
> > > > Jacques Nadeau
> > > > CTO and Co-Founder, Dremio
> > > >
> > > > On Mon, Nov 9, 2015 at 2:23 PM, Abdel Hakim Deneche <
> > > adene...@maprtech.com
> > > > >
> > > > wrote:
> > > >
> > > > > Saw similar failures when running unit tests with forkCount = 1:
> > > > >
> > > > > Tests in error:
> > > > > >
> >  TestImpersonationQueries.sequenceFileChainedImpersonationWithView »
> > > > > > UserRemote
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> TestImpersonationQueries.testMultiLevelImpersonationJoinEachSideReachesMaxUserHops:233->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:236->BaseTestQuery.updateClient:213
> > > > > > » Rpc
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> TestImpersonationQueries.testMultiLevelImpersonationExceedsMaxUserHops:219->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:236->BaseTestQuery.updateClient:213
> > > > > > » IllegalState
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> TestImpersonationQueries.avroChainedImpersonationWithView:280->BaseTestImpersonation.createView:186->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:236->BaseTestQuery.updateClient:213
> > > > > > » IllegalState
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> TestImpersonationQueries.testDirectImpersonation_HasGroupReadPermissions:186->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:236->BaseTestQuery.updateClient:213
> > > > > > » IllegalState
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> TestImpersonationQueries.testDirectImpersonation_NoReadPermissions:196->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:236->BaseTestQuery.updateClient:213
> > > > > > » IllegalState
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> TestImpersonationQueries.testMultiLevelImpersonationEqualToMaxUserHops:210->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:236->BaseTestQuery.updateClient:213
> > > > > > » IllegalState
> > > > >

Re: [VOTE] Release Apache Drill 1.3.0 (rc1)

2015-11-09 Thread Hanifi Gunes

1024/4096/2032195 <==> soft/hard/system

On Mon, Nov 9, 2015 at 4:50 PM, Jacques Nadeau <jacq...@dremio.com> wrote:

> What about open files?
> On Nov 9, 2015 4:20 PM, "Hanifi Gunes" <hgu...@maprtech.com> wrote:
>
> > Using the default parallelism, I get the same set of exceptions on
> CentOS.
> >
> > Also max # of threads per process is the system default, 319548.
> >
> > On Mon, Nov 9, 2015 at 4:13 PM, Amit Hadke <amit.ha...@gmail.com> wrote:
> >
> > > Unfortunately, yes I changed them at OS level.
> > >
> > > I agree that making OS level changes to run tests is not something we
> > > should advocate.
> > > It was just an observation that tests seem to pass on linux with
> default
> > > settings but on mac behavior is irregular.
> > >
> > > ~ Amit.
> > >
> > >
> > > On Mon, Nov 9, 2015 at 3:48 PM, Jason Altekruse <
> > altekruseja...@gmail.com>
> > > wrote:
> > >
> > > > Amit,
> > > >
> > > > Did you change these at the OS level? I would hope that we wouldn't
> > have
> > > to
> > > > make people jump through such hoops to run the build, was there a
> > thread
> > > I
> > > > missed where we discussed this something that we should be expecting
> > > > everyone to do? I did see the threads about increasing the maven
> > permgen
> > > > and did so accordingly, but I think adjusting something like the
> system
> > > > ulimit is something we really should try to avoid.
> > > >
> > > > On Mon, Nov 9, 2015 at 3:44 PM, Abdel Hakim Deneche <
> > > adene...@maprtech.com
> > > > >
> > > > wrote:
> > > >
> > > > > I just tried with the default forkCount and got similar failures.
> I'm
> > > > > running the tests on a linux machine with the default maven options
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Mon, Nov 9, 2015 at 3:21 PM, Jacques Nadeau <jacq...@dremio.com
> >
> > > > wrote:
> > > > >
> > > > > > So you're changing the forkCount from the default? What happens
> > when
> > > > you
> > > > > > run with the default?
> > > > > >
> > > > > > Also, did you guys run with extra permgen or the default? Also,
> can
> > > you
> > > > > > confirm what type of machine you are running on?
> > > > > >
> > > > > > Want to figure out why things are different.
> > > > > >
> > > > > > thanks,
> > > > > > Jacques
> > > > > >
> > > > > > --
> > > > > > Jacques Nadeau
> > > > > > CTO and Co-Founder, Dremio
> > > > > >
> > > > > > On Mon, Nov 9, 2015 at 2:23 PM, Abdel Hakim Deneche <
> > > > > adene...@maprtech.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Saw similar failures when running unit tests with forkCount =
> 1:
> > > > > > >
> > > > > > > Tests in error:
> > > > > > > >
> > > >  TestImpersonationQueries.sequenceFileChainedImpersonationWithView »
> > > > > > > > UserRemote
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> TestImpersonationQueries.testMultiLevelImpersonationJoinEachSideReachesMaxUserHops:233->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:236->BaseTestQuery.updateClient:213
> > > > > > > > » Rpc
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> TestImpersonationQueries.testMultiLevelImpersonationExceedsMaxUserHops:219->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:236->BaseTestQuery.updateClient:213
> > > > > > > > » IllegalState
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> TestImpersonationQueries.avroChained

Re: [VOTE] Release Apache Drill 1.3.0 (rc0)

2015-11-06 Thread Hanifi Gunes

Looks like we are possibly leaking some threads. Investigating.

On Fri, Nov 6, 2015 at 4:25 PM, Jacques Nadeau <jacq...@dremio.com> wrote:

> Hmm.. that is quite strange. I wonder if we need to look at thread counts
> on the daemon.
>
> We haven't changed how we create but there were changes to shutdown
> (although I can't imagine why that would be a problem).
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Fri, Nov 6, 2015 at 4:11 PM, Hanifi Gunes <hgu...@maprtech.com> wrote:
>
> > Not the testAggregateWithEmptyRequiredInput but I got the following on
> > my branch rebased top of master -- @CentOS.
> >
> > Tests in error:
> >   TestImpersonationQueries.sequenceFileChainedImpersonationWithView »
> > UserRemote
> >
> >
> TestImpersonationQueries.testMultiLevelImpersonationJoinEachSideReachesMaxUserHops:233->BaseTestQuery.updateClient:222->BaseTestQuery.
> >updateClient:236->BaseTestQuery.updateClient:213 » Rpc
> >
> >
> TestImpersonationQueries.testMultiLevelImpersonationExceedsMaxUserHops:219->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:
> >   236->BaseTestQuery.updateClient:213 » IllegalState
> >
> >
> TestImpersonationQueries.avroChainedImpersonationWithView:280->BaseTestImpersonation.createView:186->BaseTestQuery.updateClient:222-
> >  >BaseTestQuery.updateClient:236->BaseTestQuery.updateClient:213 »
> > IllegalState
> >
> >
> TestImpersonationQueries.testDirectImpersonation_HasGroupReadPermissions:186->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:
> > 236->BaseTestQuery.updateClient:213 » IllegalState
> >
> >
> TestImpersonationQueries.testDirectImpersonation_NoReadPermissions:196->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:236-
> >   >BaseTestQuery.updateClient:213 » IllegalState
> >
> >
> TestImpersonationQueries.testMultiLevelImpersonationEqualToMaxUserHops:210->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:
> >   236->BaseTestQuery.updateClient:213 » IllegalState
> >
> > exception details --->
> >
> >
> >
> testMultiLevelImpersonationExceedsMaxUserHops(org.apache.drill.exec.impersonation.TestImpersonationQueries)
> >  Time elapsed: 0.008 sec  <<<   ERROR!
> > java.lang.IllegalStateException: failed to create a child event loop
> >   at sun.nio.ch.IOUtil.makePipe(Native Method)
> >   at
> io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:126)
> >   at io.netty.channel.nio.NioEventLoop.(NioEventLoop.java:120)
> >   at
> >
> io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:87)
> >   at
> >
> io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:64)
> >   at
> >
> io.netty.channel.MultithreadEventLoopGroup.(MultithreadEventLoopGroup.java:49)
> >   at
> > io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:61)
> >   at
> > io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:52)
> >   at
> >
> org.apache.drill.exec.rpc.TransportCheck.createEventLoopGroup(TransportCheck.java:74)
> >   at
> >
> org.apache.drill.exec.client.DrillClient.createEventLoop(DrillClient.java:239)
> >   at
> org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:220)
> >   at
> org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:178)
> >   at org.apache.drill.QueryTestUtil.createClient(QueryTestUtil.java:67)
> >   at org.apache.drill.BaseTestQuery.updateClient(BaseTestQuery.java:213)
> >   at org.apache.drill.BaseTestQuery.updateClient(BaseTestQuery.java:236)
> >
> >
> > My god's telling me that we are creating too many NioEventLoopGroup's.
> > Did we make any recent changes around RPC causing this?
> >
> > -Hanifi
> >
> >
> > On Fri, Nov 6, 2015 at 3:58 PM, Jacques Nadeau <jacq...@dremio.com>
> wrote:
> >
> > > Do you have that other output/stack trace I asked about? If we can also
> > see
> > > the illegalreference count on something other than the JDBC client
> close
> > > method, that would be helpful.
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Fri, Nov 6, 2015 at 2:48 PM, Jinfeng Ni <jinfengn...@gmail.com>
> > wrote:
> > >
> > > > I just re-run, and the previous 4 failures are gone. But it failed
> > > > with two new ones:
> > > >
> > &

Re: [VOTE] Release Apache Drill 1.3.0 (rc0)

2015-11-06 Thread Hanifi Gunes

Not the testAggregateWithEmptyRequiredInput but I got the following on
my branch rebased top of master -- @CentOS.

Tests in error:
  TestImpersonationQueries.sequenceFileChainedImpersonationWithView » UserRemote

TestImpersonationQueries.testMultiLevelImpersonationJoinEachSideReachesMaxUserHops:233->BaseTestQuery.updateClient:222->BaseTestQuery.
   updateClient:236->BaseTestQuery.updateClient:213 » Rpc

TestImpersonationQueries.testMultiLevelImpersonationExceedsMaxUserHops:219->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:
  236->BaseTestQuery.updateClient:213 » IllegalState

TestImpersonationQueries.avroChainedImpersonationWithView:280->BaseTestImpersonation.createView:186->BaseTestQuery.updateClient:222-
 >BaseTestQuery.updateClient:236->BaseTestQuery.updateClient:213 »
IllegalState

TestImpersonationQueries.testDirectImpersonation_HasGroupReadPermissions:186->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:
236->BaseTestQuery.updateClient:213 » IllegalState

TestImpersonationQueries.testDirectImpersonation_NoReadPermissions:196->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:236-
  >BaseTestQuery.updateClient:213 » IllegalState

TestImpersonationQueries.testMultiLevelImpersonationEqualToMaxUserHops:210->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:
  236->BaseTestQuery.updateClient:213 » IllegalState

exception details --->

testMultiLevelImpersonationExceedsMaxUserHops(org.apache.drill.exec.impersonation.TestImpersonationQueries)
 Time elapsed: 0.008 sec  <<<   ERROR!
java.lang.IllegalStateException: failed to create a child event loop
  at sun.nio.ch.IOUtil.makePipe(Native Method)
  at io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:126)
  at io.netty.channel.nio.NioEventLoop.(NioEventLoop.java:120)
  at io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:87)
  at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:64)
  at 
io.netty.channel.MultithreadEventLoopGroup.(MultithreadEventLoopGroup.java:49)
  at io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:61)
  at io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:52)
  at 
org.apache.drill.exec.rpc.TransportCheck.createEventLoopGroup(TransportCheck.java:74)
  at 
org.apache.drill.exec.client.DrillClient.createEventLoop(DrillClient.java:239)
  at org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:220)
  at org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:178)
  at org.apache.drill.QueryTestUtil.createClient(QueryTestUtil.java:67)
  at org.apache.drill.BaseTestQuery.updateClient(BaseTestQuery.java:213)
  at org.apache.drill.BaseTestQuery.updateClient(BaseTestQuery.java:236)

My god's telling me that we are creating too many NioEventLoopGroup's.
Did we make any recent changes around RPC causing this?

-Hanifi

On Fri, Nov 6, 2015 at 3:58 PM, Jacques Nadeau  wrote:

> Do you have that other output/stack trace I asked about? If we can also see
> the illegalreference count on something other than the JDBC client close
> method, that would be helpful.
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Fri, Nov 6, 2015 at 2:48 PM, Jinfeng Ni  wrote:
>
> > I just re-run, and the previous 4 failures are gone. But it failed
> > with two new ones:
> >
> > Tests in error:
> >
> >
> TestSqlStdBasedAuthorization.org.apache.drill.exec.impersonation.hive.TestSqlStdBasedAuthorization
> > » UserRemote
> >
> >
> TestStorageBasedHiveAuthorization.org.apache.drill.exec.impersonation.hive.TestStorageBasedHiveAuthorization
> > » UserRemote
> >
> > I re-start the machine, and there are not too many applications
> > running and the memory should be enough.  At least some days back, I
> > got clean run on the same machine.
> >
> >
> >
> >
> > On Fri, Nov 6, 2015 at 2:39 PM, Jacques Nadeau 
> wrote:
> > > Can you provide the complete output for this failure:
> > >
> > > TestAggregateFunctions.testAggregateWithEmptyRequiredInput:237 »
> > > IllegalReferenceCount
> > >
> > > I haven't seen the other issues. The last one looks like the system was
> > > having an issue since thread creation failure is usually an OS problem.
> > Was
> > > your system under resourced?
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Fri, Nov 6, 2015 at 12:55 PM, Jinfeng Ni 
> > wrote:
> > >
> > >> I'm seeing unit test case failure when run "mvn clean install" over
> > >> drill master branch, on Mac.
> > >>
> > >> The first one seems to be the issue #3 in Jacques's list. The last
> > >> three seems to different from the 4 issues. Has anyone seen this
> > >> failure before, or it just happened to my mac? Thanks.
> > >>
> > >>
> > >> =
> > >> git log
> > >> commit 1a24233475ca46aaf2a49a5624b4042f088382f4
> > >>
> > >>
> > >>

[jira] [Resolved] (DRILL-3313) Eliminate redundant #load methods and unit-test loading & exporting of vectors

2015-11-02 Thread Hanifi Gunes (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanifi Gunes resolved DRILL-3313.
-
Resolution: Fixed

Fixed by 77e2b89

> Eliminate redundant #load methods and unit-test loading & exporting of vectors
> --
>
> Key: DRILL-3313
> URL: https://issues.apache.org/jira/browse/DRILL-3313
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Data Types
>Affects Versions: 1.0.0
>Reporter: Hanifi Gunes
>Assignee: Hanifi Gunes
> Fix For: 1.3.0
>
>
> Vectors have multiple #load methods that are used to populate data from raw 
> buffers. It is relatively tough to reason, maintain and unit-test loading and 
> exporting of data since there is many redundant code around load methods. 
> This issue proposes to have single #load method conforming to VV#load(def, 
> buffer) signature eliminating all other #load overrides.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Drill mvn build fail on Mac ?

2015-10-29 Thread Hanifi Gunes

This started occurring after one of the recent maven refactoring changes
got checked in. Agreed that we should document this.

-H+

On Thu, Oct 29, 2015 at 4:47 PM, Jinfeng Ni  wrote:

> After setting the following option, I can get a successful build.
>
> "
> export MAVEN_OPTS="-Xmx512m -XX:MaxPermSize=128m"
> "
>
> If we have to change the default setting to get a successful build,
> then we should document somewhere as a pre-requirement for building
> drill.
>
>
>
> On Thu, Oct 29, 2015 at 4:39 PM, Jinfeng Ni  wrote:
> > Hi,
> >
> > I tried the latest Drill master branch (commit id :
> > 1d067d26b1ba510f4a51489d62d2a6d0480a473c) with "mvn clean install
> > -DskipTests" on my Mac. However, it failed at Packaging and
> > Distribution Assembly. Has anyone seen this before?
> >
> >
> > [INFO]
> 
> > [INFO] Reactor Summary:
> > [INFO]
> > [INFO] Apache Drill Root POM .. SUCCESS [
> 4.706 s]
> > [INFO] tools/Parent Pom ... SUCCESS [
> 0.580 s]
> > [INFO] tools/freemarker codegen tooling ... SUCCESS [
> 4.553 s]
> > [INFO] Drill Protocol . SUCCESS [
> 5.834 s]
> > [INFO] Common (Logical Plan, Base expressions)  SUCCESS [
> 8.067 s]
> > [INFO] contrib/Parent Pom . SUCCESS [
> 0.581 s]
> > [INFO] contrib/data/Parent Pom  SUCCESS [
> 0.433 s]
> > [INFO] contrib/data/tpch-sample-data .. SUCCESS [
> 2.903 s]
> > [INFO] exec/Parent Pom  SUCCESS [
> 0.582 s]
> > [INFO] exec/Java Execution Engine . SUCCESS
> [01:07 min]
> > [INFO] exec/JDBC Driver using dependencies  SUCCESS [
> 8.263 s]
> > [INFO] JDBC JAR with all dependencies . SUCCESS [
> 19.962 s]
> > [INFO] contrib/mongo-storage-plugin ... SUCCESS [
> 4.360 s]
> > [INFO] contrib/hbase-storage-plugin ... SUCCESS [
> 8.317 s]
> > [INFO] contrib/jdbc-storage-plugin  SUCCESS [
> 4.042 s]
> > [INFO] contrib/hive-storage-plugin/Parent Pom . SUCCESS [
> 0.362 s]
> > [INFO] contrib/hive-storage-plugin/hive-exec-shaded ... SUCCESS [
> 16.286 s]
> > [INFO] contrib/hive-storage-plugin/core ... SUCCESS [
> 6.763 s]
> > [INFO] contrib/drill-gis-plugin ... SUCCESS [
> 4.465 s]
> > [INFO] Packaging and Distribution Assembly  SKIPPED
> > [INFO] contrib/sqlline  SKIPPED
> > [INFO]
> 
> > [INFO] BUILD SUCCESS
> > [INFO]
> 
> > [INFO] Total time: 02:56 min
> > [INFO] Finished at: 2015-10-29T16:00:05-07:00
> > [INFO] Final Memory: 139M/4020M
> > [INFO]
> 
> > [ERROR] PermGen space -> [Help 1]
> > [ERROR]
> > [ERROR] To see the full stack trace of the errors, re-run Maven with
> > the -e switch.
> > [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> > [ERROR]
> > [ERROR] For more information about the errors and possible solutions,
> > please read the following articles:
> > [ERROR] [Help 1]
> > http://cwiki.apache.org/confluence/display/MAVEN/OutOfMemoryError
> > ---
> > constituent[0]:
> > file:/Applications/apache-maven-3.2.3/lib/aether-api-0.9.0.M2.jar
> > constituent[1]:
> >
> file:/Applications/apache-maven-3.2.3/lib/aether-connector-wagon-0.9.0.M2.jar
> > constituent[2]:
> > file:/Applications/apache-maven-3.2.3/lib/aether-impl-0.9.0.M2.jar
> > constituent[3]:
> > file:/Applications/apache-maven-3.2.3/lib/aether-spi-0.9.0.M2.jar
> > constituent[4]:
> > file:/Applications/apache-maven-3.2.3/lib/aether-util-0.9.0.M2.jar
> > constituent[5]:
> file:/Applications/apache-maven-3.2.3/lib/aopalliance-1.0.jar
> > constituent[6]: file:/Applications/apache-maven-3.2.3/lib/cdi-api-1.0.jar
> > constituent[7]:
> file:/Applications/apache-maven-3.2.3/lib/commons-cli-1.2.jar
> > constituent[8]:
> file:/Applications/apache-maven-3.2.3/lib/commons-io-2.2.jar
> > constituent[9]:
> file:/Applications/apache-maven-3.2.3/lib/commons-lang-2.6.jar
> > constituent[10]:
> file:/Applications/apache-maven-3.2.3/lib/guava-14.0.1.jar
> > constituent[11]:
> file:/Applications/apache-maven-3.2.3/lib/javax.inject-1.jar
> > constituent[12]:
> file:/Applications/apache-maven-3.2.3/lib/jsoup-1.7.2.jar
> > constituent[13]:
> file:/Applications/apache-maven-3.2.3/lib/jsr250-api-1.0.jar
> > constituent[14]:
> > file:/Applications/apache-maven-3.2.3/lib/maven-aether-provider-3.2.3.jar
> > constituent[15]:
> >

setting planner max width

2015-10-28 Thread Hanifi GUNES

On a 10 node cluster, I am executing a query with the following

*alter session set `planner.width.max_per_node`=6;*

and see 153 minor fragments reported in the profiles tab whereas I would
expect a max parallelization of 60 cluster-wide.

Is not this option bounding the max # of threads per query per node? Need a
second look here.


Thanks.
-Hanifi

Re: setting planner max width

2015-10-28 Thread Hanifi GUNES

Yup. That makes sense. All ~150 minor fragments seem under a sing;e major
fragment though. I should dig in further to see what's going on here.

-H+

2015-10-28 16:14 GMT-07:00 Jacques Nadeau <jacq...@dremio.com>:

> Max width per node is per major fragment per node, not per query.
>
> So you should see no more than 60 minor fragments for any particular major
> fragment.
>
> Remember that in most cases, a multi-major-fragment query has blocking
> operations in it.
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Wed, Oct 28, 2015 at 4:11 PM, Hanifi GUNES <h...@apache.org> wrote:
>
> > On a 10 node cluster, I am executing a query with the following
> >
> > *alter session set `planner.width.max_per_node`=6;*
> >
> > and see 153 minor fragments reported in the profiles tab whereas I would
> > expect a max parallelization of 60 cluster-wide.
> >
> > Is not this option bounding the max # of threads per query per node?
> Need a
> > second look here.
> >
> >
> > Thanks.
> > -Hanifi
> >
>

Re: [DISCUSS] Proposal to turn ValueVectors into separate reusable library & project

2015-10-26 Thread Hanifi Gunes

I was hoping to see this discussion happening sooner :) VVs has helped
Drill representing and moving data around so flexibly that it would not be
hard to prove its usefulness to the community as a standalone library. I am
in support of this proposal.


-Hanifi

On Mon, Oct 26, 2015 at 2:19 PM, Jacques Nadeau  wrote:

> Drillers,
>
>
>
> A number of people have approached me recently about the possibility of
> collaborating on a shared columnar in-memory representation of data. This
> shared representation of data could be operated on efficiently with modern
> cpus as well as shared efficiently via shared memory, IPC and RPC. This
> would allow multiple applications to work together at high speed. Examples
> include moving back and forth between a library.
>
>
>
> As I was discussing these ideas with people working on projects including
> Calcite, Ibis, Kudu, Storm, Herron, Parquet and products from companies
> like MapR and Trifacta, it became clear that much of what the Drill
> community has already constructed is very relevant to the goals of a new
> broader interchange and execution format. (In fact, Ted and I actually
> informally discussed extracting this functionality as a library more than
> two years ago.)
>
>
>
> A standard will emerge around this need and it is in the best interest of
> the Drill community and the broader ecosystem if Drill’s ValueVectors
> concepts and code form the basis of a new library/collaboration/project.
> This means better interoperability, shared responsibility around
> maintenance and development and the avoidance of further division of the
> ecosystem.
>
>
>
> A little background for some: Drill is the first project to create a
> powerful language agnostic in-memory representation of complex columnar
> data. We've learned a lot over the last three years about how to interface
> with these structures, manage memory associated with them, adjust their
> sizes, expose them in builder patterns, etc. That work is useful for a
> number of systems and it would be great if we could share the learning. By
> creating a new, well documented and collaborative library, people could
> leverage this functionality in wider range of applications and systems.
>
>
>
> I’ve seen the great success that libraries like Parquet and Calcite have
> been able to achieve due to their focus on APIs, extensibility and
> reusability and I think we could do the same with the Drill ValueVector
> codebase. The fact that this would allow higher speed interchange among
> many other systems and becoming the standard for in-memory columnar
> exchange (as opposed to having to adopt an external standard) makes this a
> great opportunity to both benefit the Drill community and give back to the
> broader Apache community.
>
>
>
> As such, I’d like to open a discussion about taking this path. I think
> there would be various avenues of how to do this but my initial proposal
> would be to propose this as a new project that goes straight to a
> provisional TLP. We then would work to clean up layer responsibilities and
> extract pieces of the code into this new project where we collaborate with
> a wider group on a broader implementation (and more formal specification).
>
>
> Given the conversations I have had and the excitement and need for this, I
> think we should do this. If the community is supportive, we could probably
> see some really cool integrations around things like high-speed Python
> machine learning inside Drill operators before the end of the year.
>
>
>
> I’ll open a new JIRA and attach it here where we can start a POC &
> discussion of how we could extract this code.
>
>
> Looking forward to feedback!
>
>
> Jacques
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>

Re: [DISCUSS] Ideas to improve metadata cache read performance

2015-10-26 Thread Hanifi Gunes

I am not familiar with the contents of metadata stored but if
deserialization workload seems to be fitting to any of afterburner's
claimed improvement points [1] It could well be worth trying given the
claimed gain on throughput is substantial.

It could also be a good idea to partition caching over a number of files
for better parallelization given number of cache files generated is
*significantly* less than number of parquet files. Maintaining global
statistics seems an improvement point too.


-H+

1: https://github.com/FasterXML/jackson-module-afterburner#what-is-optimized

On Sun, Oct 25, 2015 at 9:33 AM, Aman Sinha  wrote:

> Forgot to include the link for Jackson's AfterBurner module:
>   https://github.com/FasterXML/jackson-module-afterburner
>
> On Sun, Oct 25, 2015 at 9:28 AM, Aman Sinha  wrote:
>
> > I was going to file an enhancement JIRA but thought I will discuss here
> > first:
> >
> > The parquet metadata cache file is a JSON file that contains a subset of
> > the metadata extracted from the parquet files.  The cache file can get
> > really large .. a few GBs for a few hundred thousand files.
> > I have filed a separate JIRA: DRILL-3973 for profiling the various
> aspects
> > of planning including metadata operations.  In the meantime, the
> timestamps
> > in the drillbit.log output indicate a large chunk of time spent in
> creating
> > the drill table to begin with, which indicates bottleneck in reading the
> > metadata.  (I can provide performance numbers later once we confirm
> through
> > profiling).
> >
> > A few thoughts around improvements:
> >  - The jackson deserialization of the JSON file is very slow.. can this
> be
> > speeded up ? .. for instance the AfterBurner module of jackson claims to
> > improve performance by 30-40% by avoiding the use of reflection.
> >  - The cache file read is a single threaded process.  If we were directly
> > reading from parquet files, we use a default of 16 threads.  What can be
> > done to parallelize the read ?
> >  - Any operation that can be done one time during the REFRESH METADATA
> > command ?  for instance..examining the min/max values to determine
> > single-value for partition column could be eliminated if we do this
> > computation during REFRESH METADATA command and store the summary one
> time.
> >
> >  - A pertinent question is: should the cache file be stored in a more
> > efficient format such as Parquet instead of JSON ?
> >
> > Aman
> >
> >
>

Re: [DISCUSS] Ideas to improve metadata cache read performance

2015-10-26 Thread Hanifi Gunes

One more thing, for workloads running queries over subsets of same parquet
files, we can consider maintaining an in-memory cache as well. Assuming
metadata memory footprint per file is low and parquet files are static, not
needing us to invalidate the cache often.

H+

On Mon, Oct 26, 2015 at 2:10 PM, Hanifi Gunes <hgu...@maprtech.com> wrote:

> I am not familiar with the contents of metadata stored but if
> deserialization workload seems to be fitting to any of afterburner's
> claimed improvement points [1] It could well be worth trying given the
> claimed gain on throughput is substantial.
>
> It could also be a good idea to partition caching over a number of files
> for better parallelization given number of cache files generated is
> *significantly* less than number of parquet files. Maintaining global
> statistics seems an improvement point too.
>
>
> -H+
>
> 1:
> https://github.com/FasterXML/jackson-module-afterburner#what-is-optimized
>
> On Sun, Oct 25, 2015 at 9:33 AM, Aman Sinha <amansi...@apache.org> wrote:
>
>> Forgot to include the link for Jackson's AfterBurner module:
>>   https://github.com/FasterXML/jackson-module-afterburner
>>
>> On Sun, Oct 25, 2015 at 9:28 AM, Aman Sinha <amansi...@apache.org> wrote:
>>
>> > I was going to file an enhancement JIRA but thought I will discuss here
>> > first:
>> >
>> > The parquet metadata cache file is a JSON file that contains a subset of
>> > the metadata extracted from the parquet files.  The cache file can get
>> > really large .. a few GBs for a few hundred thousand files.
>> > I have filed a separate JIRA: DRILL-3973 for profiling the various
>> aspects
>> > of planning including metadata operations.  In the meantime, the
>> timestamps
>> > in the drillbit.log output indicate a large chunk of time spent in
>> creating
>> > the drill table to begin with, which indicates bottleneck in reading the
>> > metadata.  (I can provide performance numbers later once we confirm
>> through
>> > profiling).
>> >
>> > A few thoughts around improvements:
>> >  - The jackson deserialization of the JSON file is very slow.. can this
>> be
>> > speeded up ? .. for instance the AfterBurner module of jackson claims to
>> > improve performance by 30-40% by avoiding the use of reflection.
>> >  - The cache file read is a single threaded process.  If we were
>> directly
>> > reading from parquet files, we use a default of 16 threads.  What can be
>> > done to parallelize the read ?
>> >  - Any operation that can be done one time during the REFRESH METADATA
>> > command ?  for instance..examining the min/max values to determine
>> > single-value for partition column could be eliminated if we do this
>> > computation during REFRESH METADATA command and store the summary one
>> time.
>> >
>> >  - A pertinent question is: should the cache file be stored in a more
>> > efficient format such as Parquet instead of JSON ?
>> >
>> > Aman
>> >
>> >
>>
>
>

Re: List type

2015-10-19 Thread Hanifi Gunes

If I am not wrong currently we use
i) RepeatedInt for single
ii) RepeatedList of RepeatedInt for double
iii) RepeatedList of RepeatedList of RepeatedInt for triple arrays.

I think we should refactor vector design in such way that we will only have
a ListVector eliminating the need for all Repeated* vectors as well as code
generation for those so that we would represent all these above types via
i) ListVector of IntVector
ii) ListVector of ListVector of IntVector
iii) ListVector of ListVector of ListVector of IntVector

The idea here is to favor aggregation over inheritance, which is less
redundant and more powerful. Thinking about it, we do not even need to
maintain RepeatedMapVector as it will simply be ListVector of MapVector in
the new dialect.

-Hanifi

ps: As an fyi, even though it does not include a JIRA for abstracting out a
ListVector which I discussed over the past months with many devs, [1] has a
list of items in place for refactoring vectors (and possibly the type
system).

1: https://issues.apache.org/jira/browse/DRILL-2147

On Mon, Oct 19, 2015 at 1:28 PM, Julien Le Dem  wrote:

> I'm looking at the type system in Drill and I have the following question:
> Why is there a LIST type and a REPEATED field?
> It sounds like there should only one of those 2 concepts.
> Could someone describe how the following are represented?
> - one dimensional list of int
> - 2 dimensional list of ints
> - 3 dimensional list of ints
> Thank you
>
> --
> Julien
>

Re: List type

2015-10-19 Thread Hanifi Gunes

Sounds great. I will look at both the union vector and type promotion stuff
very soon. It would be nice if we could work on bringing up ListVector
alive as well. I will file a JIRA for this.

On Mon, Oct 19, 2015 at 2:43 PM, Steven Phillips <ste...@dremio.com> wrote:

> In the work I did for the Union types, (see PR
> https://github.com/apache/drill/pull/207), I actually went down that exact
> path. In that branch, if Union type is enable, any vectors created through
> the ComplexWriter interface will not create any Repeated type vectors.
>
> On Mon, Oct 19, 2015 at 2:29 PM, Hanifi Gunes <hgu...@maprtech.com> wrote:
>
> > If I am not wrong currently we use
> > i) RepeatedInt for single
> > ii) RepeatedList of RepeatedInt for double
> > iii) RepeatedList of RepeatedList of RepeatedInt for triple arrays.
> >
> > I think we should refactor vector design in such way that we will only
> have
> > a ListVector eliminating the need for all Repeated* vectors as well as
> code
> > generation for those so that we would represent all these above types via
> > i) ListVector of IntVector
> > ii) ListVector of ListVector of IntVector
> > iii) ListVector of ListVector of ListVector of IntVector
> >
> > The idea here is to favor aggregation over inheritance, which is less
> > redundant and more powerful. Thinking about it, we do not even need to
> > maintain RepeatedMapVector as it will simply be ListVector of MapVector
> in
> > the new dialect.
> >
> > -Hanifi
> >
> > ps: As an fyi, even though it does not include a JIRA for abstracting
> out a
> > ListVector which I discussed over the past months with many devs, [1]
> has a
> > list of items in place for refactoring vectors (and possibly the type
> > system).
> >
> > 1: https://issues.apache.org/jira/browse/DRILL-2147
> >
> >
> > On Mon, Oct 19, 2015 at 1:28 PM, Julien Le Dem <jul...@dremio.com>
> wrote:
> >
> > > I'm looking at the type system in Drill and I have the following
> > question:
> > > Why is there a LIST type and a REPEATED field?
> > > It sounds like there should only one of those 2 concepts.
> > > Could someone describe how the following are represented?
> > > - one dimensional list of int
> > > - 2 dimensional list of ints
> > > - 3 dimensional list of ints
> > > Thank you
> > >
> > > --
> > > Julien
> > >
> >
>

Re: Improvements to storage plugin planning integration support

2015-10-12 Thread Hanifi Gunes

I would +1 (1-3) for sure. I do not have much understanding of programs
however additional flexibility for storage plugin devs sounds cool in
general when used responsibly =) so +0 for (4)


-H+

On Mon, Oct 12, 2015 at 4:12 PM, Jacques Nadeau  wrote:

> The dead air must mean that everyone is onboard with my recommendation
>
> PlannerIntegration StoragePlugin.getPlannerIntegrations()
>
> interface PlannerIntegration{
>   void initialize(Planner, Phase)
> }
>
> Right :D
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Fri, Oct 9, 2015 at 7:03 AM, Jacques Nadeau  wrote:
>
> > A number of us were meeting last week to work through integrating the
> > Phoenix storage plugin. This plugin is interesting because it also uses
> > Calcite for planning. In some ways, this should make integration easy.
> > However, it also allowed us to see certain constraints who how we expose
> > planner integration between storage plugins and Drill internals.
> > Currently, Drill asks the plugin to provide a set of optimizer rules
> which
> > it incorporates into one of the many stages of planning. This is too
> > constraining in two ways:
> >
> > 1. it doesn't allow a plugin to decide which phase of planning to
> > integrate with. (This was definitely a problem in the Phoenix case. Our
> > hack solution for now is to incorporate storage plugin rules in phases
> > instead of just one [1].)
> > 2. it doesn't allow arbitrary transformations. Calcite provides a program
> > concept. It may be that a plugin needs to do some of its own work using
> the
> > Hep planner. Currently there isn't an elegant way to do this in the
> context
> > of the rule.
> > 3. There is no easy way to incorporate additional planner initialization
> > options. This was almost a problem in the case of the JDBC plugin. It
> > turned out that a hidden integration using register() here [2] allowed us
> > to continue throughout the planning phases. However, we have to register
> > all the rules for all the phases of planning which is a bit unclean.
> We're
> > hitting the same problem in the case of Phoenix where we need to register
> > materialized views as part of planner initialization but the hack from
> the
> > JDBC case won't really work.
> >
> > I suggest we update the interface to allow better support for these types
> > of integrations.
> >
> > These seem to be the main requirements:
> > 1. Expose concrete planning phases to storage plugins
> > 2. Allow a storage plugin to provide additional planner initialization
> > behavior
> > 3. Allow a storage plugin to provide rules to include a particular
> > planning phase (merged with other rules during that phase).
> > 4. (possibly) allow a storage plugin to provide transformation programs
> > that are to be executed in between the concrete planning phases.
> >
> > Item (4) above is the most questionable to me as I wonder whether or not
> > this could simply be solved by creating a transformation rule (or program
> > rule in Calcite's terminology) that creates an alternative tree and thus
> be
> > solved by (3).
> >
> > A simple solution might be (if we ignore #4):
> >
> > PlannerIntegration StoragePlugin.getPlannerIntegrations()
> >
> > interface PlannerIntegration{
> >   void initialize(Planner, Phase)
> > }
> >
> > This way, a storage plugin could register rules (or materialized views)
> at
> > setup time.
> >
> > What do others think?
> >
> > [1]
> >
> https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcStoragePlugin.java#L145
> > [2]
> >
> https://github.com/jacques-n/drill/commit/d463f9098ef63b9a2844206950334cb16fc00327#diff-e67ba82ec2fbb8bc15eed30ec6a5379cR119
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
>

Re: Apache Drill: How does the plug-in know that an aggregate function is applied

2015-08-10 Thread Hanifi Gunes

+dev

+1 to Hakim. AbstractRR#isSkipQuery is the way to go. If you want more
details on this you should check out DRILL-2358[1] that is an umbrella
issue targeting to make count(*) queries more efficient per storage plugin.
Currently (I guess) JSON and Mongo(?) readers support it though.

1:
https://github.com/apache/drill/commit/54df129cab544c3df8e75a7dae3f85a91a9ded5a

On Thu, Aug 6, 2015 at 6:50 PM, Abdel Hakim Deneche adene...@maprtech.com
wrote:

Hi Sudip,

I'm not really an expert in this matter but I came recently across
isSkipQuery() method in AbstractRecordReader, it's javadoc states:

*Returns true if reader should skip all of the columns, reporting number of
records only. Handling of a skip query is storage plugin-specific.*

You can take a look at JSONRecordReader for an example on how to use
isSkipQuery() to optimize the reading.

Thanks

On Thu, Aug 6, 2015 at 2:01 AM, Sudip Mukherjee
mukherjeesud...@hotmail.com
wrote:

Hi,
I am using apache drill recently with mongodb and trying to write a basic
plug-in for apache solr. Wanted to know how does the plug-in know that an
count(*) query has been applied so that the query to data-source can be
optimized? Can i get it if I extend AbstractExprVisitor class?

Thanks,Sudip Mukherjee

Abdelhakim Deneche

Software Engineer

http://www.mapr.com/

Now Available - Free Hadoop On-Demand Training

http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available

Re: anyone seen these errors on master ?

2015-08-05 Thread Hanifi Gunes

Did you tighten your memory settings? How many forks are you running with?
I bet you are truly running out of memory while executing this particular
test case.

-H+

On Wed, Aug 5, 2015 at 8:56 PM, Sudheesh Katkam skat...@maprtech.com
wrote:

 b2bbd99 committed on July 6th introduced the test.

  On Aug 5, 2015, at 10:21 AM, Jinfeng Ni jinfengn...@gmail.com wrote:
 
  In that case,  we probably need do binary search to figure out which
 recent
  patch is causing this problem.
 
  On Wed, Aug 5, 2015 at 10:03 AM, Abdel Hakim Deneche 
 adene...@maprtech.com
  wrote:
 
  Just got those errors on master too
 
  On Wed, Aug 5, 2015 at 9:07 AM, Abdel Hakim Deneche 
 adene...@maprtech.com
 
  wrote:
 
  I'm seeing those errors intermittently when building my private
 branch, I
  don't believe I made any change that would have caused them. Anyone
 seen
  them too ?
 
 
 
 testBitVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
  Time elapsed: 2.043 sec   ERROR!
  java.lang.Exception: Unexpected exception,
  expectedorg.apache.drill.exec.exception.OversizedAllocationException
  but
  wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
  at java.nio.Bits.reserveMemory(Bits.java:658)
  at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
  at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
  at
 
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
  at
 
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69)
  at
 
 
 io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
  at
 
 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  at
 
 
 io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130)
  at
 
 
 io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171)
  at
 
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100)
  at
 
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116)
  at org.apache.drill.exec.vector.BitVector.reAlloc(BitVector.java:139)
  at
 
 
 org.apache.drill.exec.record.vector.TestValueVector.testBitVectorReallocation(TestValueVector.java:125)
 
 
 
 
 testFixedVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
  Time elapsed: 0.436 sec   ERROR!
  java.lang.Exception: Unexpected exception,
  expectedorg.apache.drill.exec.exception.OversizedAllocationException
  but
  wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
  at java.nio.Bits.reserveMemory(Bits.java:658)
  at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
  at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
  at
 
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
  at
 
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69)
  at
 
 
 io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
  at
 
 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  at
 
 
 io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130)
  at
 
 
 io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171)
  at
 
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100)
  at
 
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116)
  at
 
 
 org.apache.drill.exec.vector.UInt4Vector.allocateBytes(UInt4Vector.java:187)
  at
 
 
 org.apache.drill.exec.vector.UInt4Vector.allocateNew(UInt4Vector.java:177)
  at
 
 
 org.apache.drill.exec.record.vector.TestValueVector.testFixedVectorReallocation(TestValueVector.java:85)
 
 
 
 
 testVariableVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
  Time elapsed: 0.788 sec   ERROR!
  java.lang.Exception: Unexpected exception,
  expectedorg.apache.drill.exec.exception.OversizedAllocationException
  but
  wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
  at java.nio.Bits.reserveMemory(Bits.java:658)
  at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
  at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
  at
 
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
  at
 
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69)
  at
 
 
 io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
  at
 
 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  at
 
 
 io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130)
  at
 
 
 io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171)
  at

Re: anyone seen these errors on master ?

2015-08-05 Thread Hanifi Gunes

I don't seem to be able to re-prod this. Let me look at this and update you
all.

On Thu, Aug 6, 2015 at 12:03 AM, Abdel Hakim Deneche adene...@maprtech.com
wrote:

 I didn't make any change, I'm running 2 forks (the default). I got those
 errors 3 times now, 2 on a linux VM and 1 on a linux physical node

 On Wed, Aug 5, 2015 at 1:03 PM, Hanifi Gunes hgu...@maprtech.com wrote:

  Did you tighten your memory settings? How many forks are you running
 with?
  I bet you are truly running out of memory while executing this particular
  test case.
 
  -H+
 
  On Wed, Aug 5, 2015 at 8:56 PM, Sudheesh Katkam skat...@maprtech.com
  wrote:
 
   b2bbd99 committed on July 6th introduced the test.
  
On Aug 5, 2015, at 10:21 AM, Jinfeng Ni jinfengn...@gmail.com
 wrote:
   
In that case,  we probably need do binary search to figure out which
   recent
patch is causing this problem.
   
On Wed, Aug 5, 2015 at 10:03 AM, Abdel Hakim Deneche 
   adene...@maprtech.com
wrote:
   
Just got those errors on master too
   
On Wed, Aug 5, 2015 at 9:07 AM, Abdel Hakim Deneche 
   adene...@maprtech.com
   
wrote:
   
I'm seeing those errors intermittently when building my private
   branch, I
don't believe I made any change that would have caused them. Anyone
   seen
them too ?
   
   
   
  
 
 testBitVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
Time elapsed: 2.043 sec   ERROR!
java.lang.Exception: Unexpected exception,
   
  expectedorg.apache.drill.exec.exception.OversizedAllocationException
but
wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
at java.nio.Bits.reserveMemory(Bits.java:658)
at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
at
   
   
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
at
   
   
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69)
at
   
   
  
 
 io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
at
   
   
  
 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
at
   
   
  
 
 io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130)
at
   
   
  
 
 io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171)
at
   
   
  
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100)
at
   
   
  
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116)
at
  org.apache.drill.exec.vector.BitVector.reAlloc(BitVector.java:139)
at
   
   
  
 
 org.apache.drill.exec.record.vector.TestValueVector.testBitVectorReallocation(TestValueVector.java:125)
   
   
   
   
  
 
 testFixedVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
Time elapsed: 0.436 sec   ERROR!
java.lang.Exception: Unexpected exception,
   
  expectedorg.apache.drill.exec.exception.OversizedAllocationException
but
wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
at java.nio.Bits.reserveMemory(Bits.java:658)
at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
at
   
   
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
at
   
   
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69)
at
   
   
  
 
 io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
at
   
   
  
 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
at
   
   
  
 
 io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130)
at
   
   
  
 
 io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171)
at
   
   
  
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100)
at
   
   
  
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116)
at
   
   
  
 
 org.apache.drill.exec.vector.UInt4Vector.allocateBytes(UInt4Vector.java:187)
at
   
   
  
 
 org.apache.drill.exec.vector.UInt4Vector.allocateNew(UInt4Vector.java:177)
at
   
   
  
 
 org.apache.drill.exec.record.vector.TestValueVector.testFixedVectorReallocation(TestValueVector.java:85)
   
   
   
   
  
 
 testVariableVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
Time elapsed: 0.788 sec   ERROR!
java.lang.Exception: Unexpected exception,
   
  expectedorg.apache.drill.exec.exception.OversizedAllocationException
but
wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
at java.nio.Bits.reserveMemory(Bits.java:658

Re: Request - hold off on merging to master for 48 hours

2015-07-31 Thread Hanifi Gunes

+1 to Jacques, it'd be nice if we had the core changes easily re/viewable.
Also, would it make sense at this point to split the change set into
smaller patches as there seems more work to do now?


H+

On Fri, Jul 31, 2015 at 7:06 PM, Jacques Nadeau jacq...@dremio.com wrote:

 That sounds frustrating.

 I agree that we need to get this merged.  The old allocator is sloppy about
 accounting at best.  Lets work together on trying to come up with a
 solution. Can you point us at the current branch so other people can
 provide some brainstorming?

 --
 Jacques Nadeau
 CTO and Co-Founder, Dremio

 On Thu, Jul 30, 2015 at 4:00 PM, Chris Westin chriswesti...@gmail.com
 wrote:

  Short version: I'll call it quits on the merge moratorium for now. Thank
  you to everyone for participating. Merge away.
 
  In the precommit suite, one query fails with an illegal reference
 counting
  exception from the external sort, and Steven has found that for me. This
 is
  the closest I've ever gotten. On future attempts to commit after
 rebasing,
  I'm going to be counting on other file owners a lot more to get through
  that quickly, rather than trying to find all the newly introduced
 problems
  myself.
 
  Long version: when I run the performance suite, the results with the
  non-locking version of the allocator are terrible. Worse than the locking
  implementation of the allocator (I still have both on separate branches).
  When we ran this on the locking implementation, there was roughly a 20%
  performance degradation, and consensus was that this was too much to
 accept
  the change. The locking implementation uses a single lock for all
  allocators. (Yes, I know that sounds heavy-handed, but it wasn't the
 first
  choice. There was a prior implementation that used a lock per allocator,
  but that one got deadlocks all the time because it couldn't ensure
  consistent lock acquisition orders when allocators went to their parents
 to
  get more space, combined with allocators locking each other to transfer
 or
  share buffer ownership.)
 
  I thought I'd solve this with a non-locking implementation. In this
  version, the variables that are used to track the state of an allocator
 re
  its available space, and how it is used, are kept in a small inner class;
  the allocator has an AtomicReference to that. A space allocation consists
  of getting that reference, making a clone of it, and then making all the
  necessary changes to the clone. To commit the space transaction, I try to
  swap it in with AtomicReference.compareAndSet(). If that fails, the
  transaction is retried. I expected that there would be no failures with
  leaf allocators, because they're only used by the thread doing the
  fragment's work. Only the root should have seen contention. But the
  performance cluster test showed the performance for this implementation
 to
  be five times worse than the current master (yes 5x, not just 20% worse
  like the locking implementation). I've done some quick sanity checks
 today,
  but don't see anything obviously silly. I will investigate a little
 further
  -- I've already come up with a couple of potential issues, but I need to
 do
  a couple experiments with it over the next few hours (and which wouldn't
  leave enough time to do the merge by the 48 hour deadline).
 
  If I can't over come those issues, then I will at least go for obtaining
  the root allocator from a factory, and set things up so that the current
  and new allocator can co-exist, because the new one definitely catches a
  lot more problems -- we should be running tests with it on. Hopefully I
 can
  overcome the issues, shortly, because I think the accounting is much
 better
  (that's why it catches more problems), and we need that in order to find
  our ongoing slow memory leak.
 
  On Wed, Jul 29, 2015 at 4:00 PM, Jacques Nadeau jacq...@dremio.com
  wrote:
 
   Makes sense.
  
   --
   Jacques Nadeau
   CTO and Co-Founder, Dremio
  
   On Wed, Jul 29, 2015 at 3:32 PM, Chris Westin chriswesti...@gmail.com
 
   wrote:
  
Ordinarily, I would agree. However, in this particular case, some
 other
folks wanted me closer to master so they could use my branch to track
   down
problems in new code. Also, the problems I was seeing were in code
 I'm
   not
familiar with, but there had been several recent commits claiming to
  fix
memory issues there. So I wanted to see if the problems I was seeing
  had
been taken care of. Sure enough, my initial testing shows that the
   problems
I was trying to fix had already been fixed by others -- they went
 away
after I rebased. In this case, chasing master saved me from having to
   track
all of those down myself, and duplicating the work. I'm hoping that
  there
weren't any significant new ones introduced. Testing is proceeding.
   
On Wed, Jul 29, 2015 at 1:59 PM, Parth Chandra par...@apache.org
   wrote:
   
 I think the idea (some of the idea is mine I'm afraid) is

[jira] [Created] (DRILL-3577) Counting nested fields on CTAS-created-parquet file/s reports inaccurate results

2015-07-29 Thread Hanifi Gunes (JIRA)

Hanifi Gunes created DRILL-3577:
---

 Summary: Counting nested fields on CTAS-created-parquet file/s 
reports inaccurate results
 Key: DRILL-3577
 URL: https://issues.apache.org/jira/browse/DRILL-3577
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.1.0
Reporter: Hanifi Gunes
Assignee: Mehant Baid
Priority: Critical


I have not tried this at a smaller scale nor on JSON file directly but the 
following seems to re-prod the issue

1. Create an input file as follows
20K rows with the following - 
{some:yes,others:{other:true,all:false,sometimes:yes}}
200 rows with the following - 
{some:yes,others:{other:true,all:false,sometimes:yes,additional:last
entries only}}

2. CTAS as follows
{code:sql}
CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
{code}

This should read

{code}
Fragment Number of records written
0_0 20200
{code}

3. Count on nested fields via
{code:sql}
select count(t.others.additional) from dfs.`tmp`.`tp` t
OR
select count(t.others.other) from dfs.`tmp`.`tp` t
{code}

reports no rows as follows

{code}
EXPR$0
0
{code}

While
{code:sql}
select count(t.`some`) from dfs.`tmp`.`tp` t where t.others.additional is not 
null
{code}

reports expected 200 rows

{code}
EXPR$0
200
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [jira] [Resolved] (DRILL-3551) CTAS from complex Json source with schema change is not written (and hence not read back ) correctly

2015-07-29 Thread Hanifi Gunes

Just an fyi I dropped a comment under the issue.

-H+

On Wed, Jul 29, 2015 at 5:40 PM, Hanifi Gunes hgu...@maprtech.com wrote:

 Would you attach a sample input file manifesting the problem? My
 impression from outset was that a field selection bug that we recently
 fixed might have caused this.


 Thanks.
 -Hanifi

 On Wed, Jul 29, 2015 at 5:07 PM, Stefán Baxter ste...@activitystream.com
 wrote:

 Hi,

 I think that this problem only showed it self for large datasets where
 assumptions were being made after 1k records.

 Were you able to reproduce this with a smaller set?

 Regards,
  -Stefan


 On Wed, Jul 29, 2015 at 2:01 PM, Hanifi Gunes (JIRA) j...@apache.org
 wrote:

 
   [
 
 https://issues.apache.org/jira/browse/DRILL-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
  ]
 
  Hanifi Gunes resolved DRILL-3551.
  -
  Resolution: Fixed
 
  Tested on a small input file of 20 mixed records with and w/o the
  additional field. Looks like the good old field projection problem
 surfaces
  here. So quite likely fixed by DRILL-3476. Please re-open attaching an
  input file if not fixed.
 
   CTAS from complex Json source with schema change  is not written (and
  hence not read back ) correctly
  
 
 -
  
   Key: DRILL-3551
   URL: https://issues.apache.org/jira/browse/DRILL-3551

   Project: Apache Drill
Issue Type: Bug
Components: Execution - Data Types
  Affects Versions: 1.1.0
  Reporter: Parth Chandra
  Assignee: Hanifi Gunes
  Priority: Critical
   Fix For: 1.2.0
  
  
   The source data contains -
   20K rows with the following -
  
 {some:yes,others:{other:true,all:false,sometimes:yes}}
   200 rows with the following -
  
 
 {some:yes,others:{other:true,all:false,sometimes:yes,additional:last
   entries only}}
   Creating a table and reading it back returns incorrect data -
   CREATE TABLE testparquet as select * from `test.json`;
   SELECT * from testparquet;
   Yields
   | yes  | {other:true,all:false,sometimes:yes}  |
   | yes  | {other:true,all:false,sometimes:yes}  |
   | yes  | {other:true,all:false,sometimes:yes}  |
   | yes  | {other:true,all:false,sometimes:yes}  |
   The additional field is missing in all records
   Parquet metadata for the created file does not have the 'additional'
  field
 
 
 
  --
  This message was sent by Atlassian JIRA
  (v6.3.4#6332)

Re: Request - hold off on merging to master for 48 hours

2015-07-29 Thread Hanifi Gunes

I am fine with holding off my check-ins until Friday noon.

On Wed, Jul 29, 2015 at 9:41 PM, Chris Westin chriswesti...@gmail.com
wrote:

 I've got a large patch that includes a completely rewritten direct memory
 allocator (replaces TopLevelAllocator).

 The space accounting is much tighter than with the current implementation,
 and it catches a lot more problems than the current implementation does. It
 also fixes issues with accounting around the use of shared buffers, and
 buffer ownership transfer (used by the RPC layer to hand off buffers to
 fragments that will do work).

 It's been an ongoing battle to get this in, because every time I get close,
 I rebase, and it finds more new problems (apparently introduced by other
 work done since my last rebase). These take time to track down and fix,
 because they're often in areas of the code I don't know.

 It looks like I'm very close right now. I rebased against apache/master on
 Friday. All the unit tests passed. All of our internal tests passed except
 for one query, which takes an IllegalReferenceCountException (it looks like
 a DrillBuf is being released one more time than it should be).

 So, in order to keep the gap from getting wide again (it looks like I'm
 already a couple of commits behind, but hopefully they don't introduce more
 issues), I'm asking that folks hold off on merging into master for 48 hours
 from now -- that's until about noon on Friday PST. I'm hoping that will
 give me the time needed to finally get this in. If things go wrong with my
 current patching, or I discover other problems, or can't find the illegal
 reference count issue by then, I'll post a message and open things up
 again. Meanwhile, you can still pull, do work, make pull requests, and get
 them reviewed; just don't merge them to master.

 Can we agree to this?

 Chris

[jira] [Resolved] (DRILL-3550) Incorrect results reading complex data with schema change

2015-07-29 Thread Hanifi Gunes (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanifi Gunes resolved DRILL-3550.
-
Resolution: Fixed

Fixed by DRILL-3476.

 Incorrect results reading complex data with schema change
 -

 Key: DRILL-3550
 URL: https://issues.apache.org/jira/browse/DRILL-3550
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 1.1.0
Reporter: Parth Chandra
Assignee: Hanifi Gunes
Priority: Critical
 Fix For: 1.2.0


 Given the data : 
 {some:yes,others:{other:true,all:false,sometimes:yes}}
 {some:yes,others:{other:true,all:false,sometimes:yes,additional:last
  entries only}}
 The query 
 select `some`, t.others, t.others.additional from `test.json` t;
  produces incorrect results - 
 | yes  | {additional:last entries only}  | last entries only  |
 instead of 
 | yes  | 
 {other:true,all:false,sometimes:yes,additional:last entries 
 only}  | last entries only  |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [jira] [Resolved] (DRILL-3551) CTAS from complex Json source with schema change is not written (and hence not read back ) correctly

2015-07-29 Thread Hanifi Gunes

Would you attach a sample input file manifesting the problem? My impression
from outset was that a field selection bug that we recently fixed might
have caused this.


Thanks.
-Hanifi

On Wed, Jul 29, 2015 at 5:07 PM, Stefán Baxter ste...@activitystream.com
wrote:

 Hi,

 I think that this problem only showed it self for large datasets where
 assumptions were being made after 1k records.

 Were you able to reproduce this with a smaller set?

 Regards,
  -Stefan


 On Wed, Jul 29, 2015 at 2:01 PM, Hanifi Gunes (JIRA) j...@apache.org
 wrote:

 
   [
 
 https://issues.apache.org/jira/browse/DRILL-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
  ]
 
  Hanifi Gunes resolved DRILL-3551.
  -
  Resolution: Fixed
 
  Tested on a small input file of 20 mixed records with and w/o the
  additional field. Looks like the good old field projection problem
 surfaces
  here. So quite likely fixed by DRILL-3476. Please re-open attaching an
  input file if not fixed.
 
   CTAS from complex Json source with schema change  is not written (and
  hence not read back ) correctly
  
 
 -
  
   Key: DRILL-3551
   URL: https://issues.apache.org/jira/browse/DRILL-3551
   Project: Apache Drill
Issue Type: Bug
Components: Execution - Data Types
  Affects Versions: 1.1.0
  Reporter: Parth Chandra
  Assignee: Hanifi Gunes
  Priority: Critical
   Fix For: 1.2.0
  
  
   The source data contains -
   20K rows with the following -
  
 {some:yes,others:{other:true,all:false,sometimes:yes}}
   200 rows with the following -
  
 
 {some:yes,others:{other:true,all:false,sometimes:yes,additional:last
   entries only}}
   Creating a table and reading it back returns incorrect data -
   CREATE TABLE testparquet as select * from `test.json`;
   SELECT * from testparquet;
   Yields
   | yes  | {other:true,all:false,sometimes:yes}  |
   | yes  | {other:true,all:false,sometimes:yes}  |
   | yes  | {other:true,all:false,sometimes:yes}  |
   | yes  | {other:true,all:false,sometimes:yes}  |
   The additional field is missing in all records
   Parquet metadata for the created file does not have the 'additional'
  field
 
 
 
  --
  This message was sent by Atlassian JIRA
  (v6.3.4#6332)

[jira] [Resolved] (DRILL-3551) CTAS from complex Json source with schema change is not written (and hence not read back ) correctly

2015-07-29 Thread Hanifi Gunes (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanifi Gunes resolved DRILL-3551.
-
Resolution: Fixed

Tested on a small input file of 20 mixed records with and w/o the additional 
field. Looks like the good old field projection problem surfaces here. So quite 
likely fixed by DRILL-3476. Please re-open attaching an input file if not fixed.

 CTAS from complex Json source with schema change  is not written (and hence 
 not read back ) correctly
 -

 Key: DRILL-3551
 URL: https://issues.apache.org/jira/browse/DRILL-3551
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 1.1.0
Reporter: Parth Chandra
Assignee: Hanifi Gunes
Priority: Critical
 Fix For: 1.2.0


 The source data contains - 
 20K rows with the following - 
 {some:yes,others:{other:true,all:false,sometimes:yes}}   
 200 rows with the following - 
 {some:yes,others:{other:true,all:false,sometimes:yes,additional:last
 entries only}}
 Creating a table and reading it back returns incorrect data - 
 CREATE TABLE testparquet as select * from `test.json`;
 SELECT * from testparquet;
 Yields 
 | yes  | {other:true,all:false,sometimes:yes}  |
 | yes  | {other:true,all:false,sometimes:yes}  |
 | yes  | {other:true,all:false,sometimes:yes}  |
 | yes  | {other:true,all:false,sometimes:yes}  |
 The additional field is missing in all records
 Parquet metadata for the created file does not have the 'additional' field 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Threads left after Drillbit shutdown (in dev./unit tests)

2015-07-10 Thread Hanifi Gunes

Is there any way to re-produce this at a smaller scale? Have you tried
failing a couple of tests and dumping threads?

-Hanifi
Thanks

On Fri, Jul 10, 2015 at 1:10 PM, Daniel Barclay dbarc...@maprtech.com
wrote:

 Is Drill terminating threads correctly?

 In running jstack on a JVM running a dev. test run that ended up hung
 after getting about three test timeout errors, I see that there are
 409 threads.

 Although 138 of those are not-unexpected ShutdownHook threads (since
 many tests are run in one VM), there are:
 - 138 WorkManager.StatusThread threads (hmm 138 again)
 -   7 Client-1 threads
 -   4 UserServer-1 threads
 -  21 BitClient-1 threads
 -   4 BitClient-2 threads
 -   3 BitClient-3 threads
 -   8 BitServer-1 threads
 -   8 BitServer-2 threads
 -   7 BitServer-3 threads
 -   7 BitServer-4 threads
 -   7 BitServer-5 threads
 -   6 BitServer-6 threads
 -   6 BitServer-7 threads
 -   6 BitServer-8 threads
 -   5 BitServer-9 threads
 -   5 BitServer-10 threads
 (Other thread names have only 1 or 2 occurrences.)

 Regarding the 4 for the number of UserServer-1 threads:  Three test
 methods had timeout failures plus one got hung.


 Here's the tail end of the output from the test running, including
 all the timeout errors and including the hang (except for repeated
 query-results data lines).



 dbarclay@dev-linux2 ~/work/git/incubator-drill $ time mvn install

 TRIMMED

 Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun
 Running
 org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeOneEntryRun
 Running
 org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#twoBitOneExchangeTwoEntryRun
 Running
 org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRun
 Running
 org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRunLogical
 Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 48.117 sec
 - in org.apache.drill.exec.physical.impl.TestDistributedFragmentRun
 Running org.apache.drill.exec.physical.impl.TestBroadcastExchange
 Running
 org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestSingleBroadcastExchangeWithTwoScans
 00:44:34.017 [globalEventExecutor-1-523] ERROR
 o.a.z.server.NIOServerCnxnFactory - Thread
 Thread[globalEventExecutor-1-523,5,main] died
 java.lang.AssertionError: null
 at
 io.netty.util.concurrent.AbstractScheduledEventExecutor.pollScheduledTask(AbstractScheduledEventExecutor.java:83)
 ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
 at
 io.netty.util.concurrent.GlobalEventExecutor.fetchFromScheduledTaskQueue(GlobalEventExecutor.java:110)
 ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
 at
 io.netty.util.concurrent.GlobalEventExecutor.takeTask(GlobalEventExecutor.java:95)
 ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
 at
 io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run(GlobalEventExecutor.java:226)
 ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
 at
 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
 ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
 at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_72]
 Running
 org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestMultipleSendLocationBroadcastExchange
 1
 Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 111.599
 sec  FAILURE! - in
 org.apache.drill.exec.physical.impl.TestBroadcastExchange
 TestSingleBroadcastExchangeWithTwoScans(org.apache.drill.exec.physical.impl.TestBroadcastExchange)
 Time elapsed: 50.063 sec   ERROR!
 java.lang.Exception: test timed out after 5 milliseconds
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:503)
 at
 io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
 at
 io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32)
 at
 io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31)
 at
 org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218)
 at com.google.common.io.Closeables.close(Closeables.java:77)
 at com.google.common.io
 .Closeables.closeQuietly(Closeables.java:108)
 at
 org.apache.drill.exec.rpc.data.DataConnectionCreator.close(DataConnectionCreator.java:70)
 at com.google.common.io.Closeables.close(Closeables.java:77)
 at com.google.common.io
 .Closeables.closeQuietly(Closeables.java:108)
 at
 org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:88)
 at com.google.common.io.Closeables.close(Closeables.java:77)
 at com.google.common.io
 .Closeables.closeQuietly(Closeables.java:108)
 at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288)
 at
 org.apache.drill.exec.physical.impl.TestBroadcastExchange.TestSingleBroadcastExchangeWithTwoScans(TestBroadcastExchange.java:62)

[jira] [Resolved] (DRILL-3031) Project missing a column (empty repeated type)

2015-07-08 Thread Hanifi Gunes (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanifi Gunes resolved DRILL-3031.
-
Resolution: Not A Problem

Drill way of handling list types is lazy meaning that we don't materialize a 
column until we find out a scalar type within the list.

 Project missing a column (empty repeated type)
 --

 Key: DRILL-3031
 URL: https://issues.apache.org/jira/browse/DRILL-3031
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Reporter: Rahul Challapalli
Assignee: Hanifi Gunes
 Fix For: 1.2.0


 git.commit.id.abbrev=4689468
 Data :
 {code}
 {id:1, arr:[]}
 {code}
 Query :
 {code}
 0: jdbc:drill:schema=dfs_eea select * from `file1.json`;
 ++
 | id |
 ++
 | 1  |
 ++
 1 row selected (0.131 seconds)
 {code}
 Drill did not report the second column here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 36292: DRILL-2838: flatten after join failing

2015-07-08 Thread Hanifi Gunes


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36292/#review91027
---



exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/flatten/TestFlatten.java
 (line 65)
https://reviews.apache.org/r/36292/#comment144255

I am not sure how different flatten is planned as it is a UDF looking 
operator. However, should we not flatten the same field at least twice to test 
this patch?


- Hanifi Gunes


On July 8, 2015, 5:44 a.m., Jason Altekruse wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36292/
 ---
 
 (Updated July 8, 2015, 5:44 a.m.)
 
 
 Review request for drill and Hanifi Gunes.
 
 
 Bugs: DRILL-2838
 https://issues.apache.org/jira/browse/DRILL-2838
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 This issue was fixed for project recently, it relates to imcomplete schema 
 population. I applied the same fix to the flatten operator. I tried to get 
 some intial work done on refactoring to more appropriately share code between 
 flatten and project. This is a bit large in scope, so I have added a new JIRA 
 to track this https://issues.apache.org/jira/browse/DRILL-3471
 
 
 Diffs
 -
 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java
  491ced3 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/flatten/TestFlatten.java
  39e36eb 
   
 exec/java-exec/src/test/resources/flatten/complex_transaction_example_data_modified.json
  PRE-CREATION 
   
 exec/java-exec/src/test/resources/flatten/complex_transaction_repeated_map_double_copy_baseline.json
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/36292/diff/
 
 
 Testing
 ---
 
 Testing in progress, relevent related unit tests are passing
 
 
 Thanks,
 
 Jason Altekruse

Re: Review Request 36229: Patch for DRILL-1750

2015-07-08 Thread Hanifi Gunes


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36229/#review91030
---



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
(line 213)
https://reviews.apache.org/r/36229/#comment144259

I remember a similar problem where vectors in a container have different 
number of values. Would it be better to move this logic into 
VectorContainer#setRecordCount to ensure vectors are aligned within a container?


- Hanifi Gunes


On July 6, 2015, 11:39 p.m., Steven Phillips wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36229/
 ---
 
 (Updated July 6, 2015, 11:39 p.m.)
 
 
 Review request for drill.
 
 
 Bugs: DRILL-1750
 https://issues.apache.org/jira/browse/DRILL-1750
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 DRILL-1750: Set value count on all outgoing vectors in ScanBatch
 
 
 Diffs
 -
 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java
  6bf1280ae09045a4d73d566c25d624acced6a68d 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/store/json/TestJsonRecordReader.java
  bb1af9eb2e6ab4950c166b8057680fff175c7a3f 
   exec/java-exec/src/test/resources/jsoninput/1750/a.json PRE-CREATION 
   exec/java-exec/src/test/resources/jsoninput/1750/b.json PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/36229/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Steven Phillips

Re: Apache drill with apache spark

2015-07-06 Thread Hanifi Gunes

+dev

Sounds cool. I have some plans to check the existing code as early as time
permits. The existing code is forked off from ~0.5 and relies on Spark
1.1.0. We should discuss possible contribution areas once I check the code
in then.

-Hanifi

On Mon, Jul 6, 2015 at 3:49 PM, Hafiz Mujadid hafizmujadi...@gmail.com
wrote:

 yes If I could help you guys in this regard then I will contribute to this
 project. It will be pleasant to work on this project.



 On Tue, Jul 7, 2015 at 2:46 AM, Parth Chandra par...@apache.org wrote:

  Hi Hafiz,
 
We did do some work on integrating with Spark a while ago but never
  published the work.
 
We are looking at a way to make the work available, but I don't have a
  timeline for you.
 
Is this something you might be interested in contributing to?
 
  Parth
 
  On Fri, Jul 3, 2015 at 2:13 AM, Hafiz Mujadid hafizmujadi...@gmail.com
  wrote:
 
   Hi All!
  
   I am new to apache drill and I want to know whether apache drill has
 some
   integration with apache spark or not. I read that work on integration
  with
   spark is in progress, What is the expected time to make it available
 for
   users?
  
  
   Thanks
  
 



 --
 Regards: HAFIZ MUJADID

[jira] [Resolved] (DRILL-2851) Memory LEAK - FLATTEN function fails when input array has 99,999 integer type elements

2015-07-06 Thread Hanifi Gunes (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanifi Gunes resolved DRILL-2851.
-
Resolution: Fixed

Fixed by b2bbd9941be6b132a83d27c0ae02c935e1dec5dd

 Memory LEAK - FLATTEN function fails when input array has 99,999 integer type 
 elements
 --

 Key: DRILL-2851
 URL: https://issues.apache.org/jira/browse/DRILL-2851
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 0.9.0
 Environment: 64e3ec52b93e9331aa5179e040eca19afece8317 | DRILL-2611: 
 value vectors should report valid value count | 16.04.2015 @ 13:53:34 EDT
Reporter: Khurram Faraaz
Assignee: Hanifi Gunes
Priority: Critical
 Fix For: 1.2.0

 Attachments: Jsn_Arry100.json


 FLATTEN function does not return results when input array has 99,999 elements 
 of type integer. Test was run on 4 node cluster on CentOS.
 Looks like there is a memory leak somewhere because I see this message in the 
 stack trace. Failure while closing accountor.
 {code}
 0: jdbc:drill: select name,flatten(num_list) from `Jsn_Arry100.json`;
 Query failed: SYSTEM ERROR: initialCapacity: -2147483648 (expectd: 0+)
 [2b1960d3-c9e5-43cc-926e-783257cc1a0a on centos-04.qa.lab:31010]
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}
 There are 99,999 integer type elements in the input array.
 {code}
 0: jdbc:drill: select repeated_count(tmp.num_list) from `Jsn_Arry100.json` 
 tmp;
 ++
 |   EXPR$0   |
 ++
 | 9  |
 ++
 1 row selected (0.206 seconds)
 {code}
 The below query does NOT return all the 99,999 array elements. Instead query 
 returns incomplete/incorrect results.
 {code}
 0: jdbc:drill: select tmp.num_list from `Jsn_Arry100.json` tmp; 
 ++
 |  num_list  |
 ++
 | 
 [13729690148580628,33968383451250544,19729687836805948,10887002260554076,59060271975938184,8770403528608,73861468909987168,88448989981158368,57022772257593232,39886649400114208,59634364188801728,49220606544154264,14668098707176940,1389807151115602,68917737889186128,63305048453386056,65444797852007920,45819687647405048,2572319061962803,30445371906667328,40923257482,63616924473446912,71267711965641544,39123983528649880,33710976622262344,77269001875184320,21285915577966972,79158399148342640,59927799708140928,86792628837467632,42093378633089096,90092572909620640,27822481748467540,72874902594517600,4613424573716378,30348741516686960,15384400403234324,11312915166709258,67842306780001720,63536034928852224,34064758786460920,26742651581265676,69283348697630136,21337762946874492,33408483778102284,48525199800724472,17366398171254334,78156187420036768,55683717108215368,18931739169089156,53749386072208016,75716769953459472,70124914126143752,59670587242776496,51687272393733984,56590991314575,27798845713791992,57060186084971832,80286905552877744,11576390595076536,67263019646709888,62231475148843856,38916556483991208,56870539861336200,13820727892494552,43440054512663296,43294405699266032,22942812764355256,36368231331952648,52243742256792032,48655336740833488,87161415891865104,74521440214901872,23963586190891468,86994250559679456,64235294682694968,89237837009514944,27981168913939540,86292939130238320,9803308891945780,67756129441807768,47956437308413040,80258260743958512,33828778916469536,8419255582699858,74560162108011744,38543450292892984,21415391273461424,31658676421021728,10459153047723704,73185429768682384,2722949752940800,47317026756664832,54654463675350704,26408396249642988,43938679869946224,56284724555406552,71685791015933536,4600112644422,68794495595243400,4132552956335268,34688634621761076,35712207345268316,60213985727177096,5309212305048668,23619206841198624,64825327846779600,57007947199911488,47136678928834984,9122467237416458,40980849927903520,20914596281235304,77253379050572112,80560013812121728,46022174419866912,22101913922421504,4767879560244378,7295985537104415,33463848115973120,21159496087635528,7209644495944438,8339341099391089,56466941472321008,49199375007197008,19176844581370920,87713551163634272,68410140933429768,22463584602000640,31007049671139340,88991676818494464,86018450036815872,68296039329719152,40855740519055456,49077682665580104,8114179710795960,19904226756293724,38777905573998560,87116624775839824,64713191586087936,23526229712701268,6299956558568,61842195984909040,20542838487485340,64036764083509024,1241102073337856,2271939416579819,40485492208390352,32715355572929904,72573056371850208,79275623956295168,63005955205671488,80329641439025648,50636113571403528,84640203909310624,58152135741332672,76874210847790576,45638822712398088

Re: [VOTE] Release Apache Drill 1.1.0 (rc0)

2015-07-02 Thread Hanifi GUNES

* Jinfeng*

*-  Verified checksum for both the source and binary tar files.*

* Hanifi, Sudheesh*

*- manually inspected maven repo- built a query submitter importing
jdbc-all artifact from the repo at [jacques:3]*

Is there a guideline on verifying maven artifacts besides inspecting
published POMs or trying to use them? I could do that if someone points me.


Thanks.
-Hanifi


2015-07-02 20:09 GMT-07:00 Ted Dunning ted.dunn...@gmail.com:

 I haven't seen that anybody is checking signatures and the maven artifacts.

 Is anybody doing that?  If not, the release should be held back until that
 is done.

 (I can't do it due to time pressure)



 On Thu, Jul 2, 2015 at 6:58 PM, Aman Sinha asi...@maprtech.com wrote:

  Downloaded the binary tar-ball.  Installed on my macbook.  Started
 sqlline
  in embedded mode. Saw that sqlline is showing version 1.0.0 instead of
  1.1.0, although 'select * from sys.version'  is showing the right commit.
  Anyone else sees this ?
 
  /sqlline -u jdbc:drill:zk=local -n admin -p admin --maxWidth=10
  ...
  apache drill 1.0.0
  just drill it
 
 
 
  On Thu, Jul 2, 2015 at 6:01 PM, Jason Altekruse 
 altekruseja...@gmail.com
  wrote:
 
   +1 binding
  
   - downloaded and built the source tarball, all tests passed (on MAC
 osx)
   - started sqlline, issued a few queries
   - tried a basic update of storage plugin from the web UI and looked
 over
  a
   few query profiles
  
  
   On Thu, Jul 2, 2015 at 5:42 PM, Mehant Baid baid.meh...@gmail.com
  wrote:
  
+1 (binding)
   
* Downloaded src tar-ball, was able to build and run unit tests
successfully.
* Brought up DrillBit in embedded and distributed mode.
* Ran some TPC-H queries via Sqlline and the web UI.
* Checked the UI for profiles
   
Looks good.
   
Thanks
Mehant
   
   
   
On 7/2/15 5:36 PM, Sudheesh Katkam wrote:
   
+1 (non-binding)
   
* downloaded binary tar-ball
* ran queries (including cancellations) in embedded mode on Mac;
   verified
states in web UI
   
* downloaded and built from source tar-ball; ran unit tests on Mac
* ran queries (including cancellations) on a 3 node cluster;
 verified
states in web UI
   
* built a Java query submitter that uses the maven artifacts
   
Thanks,
Sudheesh
   
 On Jul 2, 2015, at 4:06 PM, Hanifi Gunes hgu...@maprtech.com
  wrote:
   
- fully built and tested Drill from source on CentOS
- deployed on 3 nodes
- ran concurrent queries
- manually inspected maven repo
- built a Scala query submitter importing jdbc-all artifact from
 the
   repo
at [jacques:3]
   
overall, great job!
   
+1 (binding)
   
On Thu, Jul 2, 2015 at 3:16 PM, rahul challapalli 
challapallira...@gmail.com wrote:
   
 +1 (non-binding)
   
Tested the new CTAS auto partition feature
Published jdbc-all artifact looks good as well
   
I am able to add the staged jdbc-all package as a dependency to my
sample
JDBC app's pom file and I was able to connect to my drill
 cluster. I
think
this is a sufficient test for the published artifact.
   
Part of the pom file below
   
repositories
repository
  idstaged-releases/id
  url
   
  http://repository.apache.org/content/repositories/orgapachedrill-1001
/url
/repository
  /repositories
   dependencies
dependency
groupIdorg.apache.drill.exec/groupId
artifactIddrill-jdbc-all/artifactId
version1.1.0/version
  /dependency
/dependencies
   
- Rahul
   
On Thu, Jul 2, 2015 at 2:02 PM, Parth Chandra 
  pchan...@maprtech.com
wrote:
   
 +1 (binding)
   
Release looks good.
Built from source (mvn clean install).
Verified src checksum.
Built C++ client, ran multiple parallel queries from C++ client
   against
drillbit. Tested all datatypes with C++ client.
   
   
   
   
   
On Thu, Jul 2, 2015 at 1:49 PM, Hsuan Yi Chu 
 hyi...@maprtech.com
   
wrote:
   
+1 (non-binding)
   
Unit tests passed on mac  linux VM. Tried a few queries on
 2-node
   VM.
   
All
   
worked out.
   
On Thu, Jul 2, 2015 at 1:24 PM, Norris Lee norr...@simba.com
   wrote:
   
 I built from source on Linux and ran queries against different
  data
sources/file types through ODBC. Also ran our internal ODBC
 test
   
suite.
   
Looks good.
   
+1 (non-binding)
   
Norris
   
-Original Message-
From: Jinfeng Ni [mailto:jinfengn...@gmail.com]
Sent: Wednesday, July 01, 2015 4:03 PM
To: dev@drill.apache.org
Subject: Re: [VOTE] Release Apache Drill 1.1.0 (rc0)
   
-  Download the src tar on Mac and Linux and do a mvn full
 build.
-  Start drill in embedded mode on both Mac and Linux. Run
  several
   
TPCH
   
queries.
-  Tried with the CTAS auto-partitioning feature with some TPCH
   
dataset.
   
-  Verified checksum

Re: [VOTE] Release Apache Drill 1.1.0 (rc0)

2015-07-02 Thread Hanifi Gunes

- fully built and tested Drill from source on CentOS
- deployed on 3 nodes
- ran concurrent queries
- manually inspected maven repo
- built a Scala query submitter importing jdbc-all artifact from the repo
at [jacques:3]

overall, great job!

+1 (binding)

On Thu, Jul 2, 2015 at 3:16 PM, rahul challapalli 
challapallira...@gmail.com wrote:

 +1 (non-binding)

 Tested the new CTAS auto partition feature
 Published jdbc-all artifact looks good as well

 I am able to add the staged jdbc-all package as a dependency to my sample
 JDBC app's pom file and I was able to connect to my drill cluster. I think
 this is a sufficient test for the published artifact.

 Part of the pom file below

 repositories
 repository
   idstaged-releases/id
   url
 http://repository.apache.org/content/repositories/orgapachedrill-1001
 /url
 /repository
   /repositories
dependencies
 dependency
 groupIdorg.apache.drill.exec/groupId
 artifactIddrill-jdbc-all/artifactId
 version1.1.0/version
   /dependency
  /dependencies

 - Rahul

 On Thu, Jul 2, 2015 at 2:02 PM, Parth Chandra pchan...@maprtech.com
 wrote:

  +1 (binding)
 
  Release looks good.
  Built from source (mvn clean install).
  Verified src checksum.
  Built C++ client, ran multiple parallel queries from C++ client against
  drillbit. Tested all datatypes with C++ client.
 
 
 
 
 
  On Thu, Jul 2, 2015 at 1:49 PM, Hsuan Yi Chu hyi...@maprtech.com
 wrote:
 
   +1 (non-binding)
  
   Unit tests passed on mac  linux VM. Tried a few queries on 2-node VM.
  All
   worked out.
  
   On Thu, Jul 2, 2015 at 1:24 PM, Norris Lee norr...@simba.com wrote:
  
I built from source on Linux and ran queries against different data
sources/file types through ODBC. Also ran our internal ODBC test
 suite.
Looks good.
   
+1 (non-binding)
   
Norris
   
-Original Message-
From: Jinfeng Ni [mailto:jinfengn...@gmail.com]
Sent: Wednesday, July 01, 2015 4:03 PM
To: dev@drill.apache.org
Subject: Re: [VOTE] Release Apache Drill 1.1.0 (rc0)
   
-  Download the src tar on Mac and Linux and do a mvn full build.
-  Start drill in embedded mode on both Mac and Linux. Run several
 TPCH
queries.
-  Tried with the CTAS auto-partitioning feature with some TPCH
  dataset.
-  Verified checksum for both the source and binary tar files.
   
All look good.
   
+1  (binding).
   
   
   
On Wed, Jul 1, 2015 at 2:16 PM, Abdel Hakim Deneche 
   adene...@maprtech.com

wrote:
   
 I've built the src on a Mac and on linux vm machine and both were
 successful with all unit tests passing.

 I tried the binary release on my Mac: I started an embedded
 drillbit
 and run some queries, everything seems fine.

 LGTM +1 (non bonding)

 On Wed, Jul 1, 2015 at 11:40 AM, Jacques Nadeau 
 jacq...@apache.org
 wrote:

  Hey Everybody,
 
  I'm happy to propose a new release of Apache Drill, version
 1.1.0.
  This
 is
  the first release candidate (rc0).  It covers a total of 162
 closed
  JIRAs [1].
 
  The tarball artifacts are hosted at [2] and the maven artifacts
  (new
  for this release) are hosted at [3].
 
  The vote will be open for 72 hours ending at Noon Pacific, July
 4,
2015.
 
  [ ] +1
  [ ] +0
  [ ] -1
 
  thanks,
  Jacques
 
  [1]
 
 

  https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313
 820version=12329689
  [2] http://people.apache.org/~jacques/apache-drill-1.1.0.rc0/
  [3]
 
  https://repository.apache.org/content/repositories/orgapachedrill-10
  01/
 



 --

 Abdelhakim Deneche

 Software Engineer

   http://www.mapr.com/


 Now Available - Free Hadoop On-Demand Training 

  http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm
 _campaign=Free%20available

Re: Review Request 36103: DRILL-3445: BufferAllocator.buffer() implementations should throw an OutOfMemoryRuntimeException

2015-07-01 Thread Hanifi Gunes


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36103/#review90137
---

Ship it!


Looks good.


exec/java-exec/src/main/java/org/apache/drill/exec/record/selection/SelectionVector2.java
 (line 94)
https://reviews.apache.org/r/36103/#comment143113

Not a show stopper but I would suggest renaming this to allocateNewSafe, 
following conventions in the rest of codebase.


- Hanifi Gunes


On July 1, 2015, 9:21 p.m., abdelhakim deneche wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36103/
 ---
 
 (Updated July 1, 2015, 9:21 p.m.)
 
 
 Review request for drill, Chris Westin and Hanifi Gunes.
 
 
 Bugs: DRILL-3445
 https://issues.apache.org/jira/browse/DRILL-3445
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 BufferAllocator.buffer(int) implementations throw an 
 OutOfMemoryRuntimeException instead of returning null.
 
 
 Diffs
 -
 
   exec/java-exec/src/main/codegen/templates/FixedValueVectors.java 7103a17 
   exec/java-exec/src/main/codegen/templates/VariableLengthVectors.java 
 50ae770 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/cache/VectorAccessibleSerializable.java
  016cd92 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/memory/BufferAllocator.java
  811cceb 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/memory/TopLevelAllocator.java
  b4386a4 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/record/selection/SelectionVector2.java
  7a7c012 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/rpc/ProtobufLengthDecoder.java
  4e03f11 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetDirectByteBufferAllocator.java
  cf30db6 
   exec/java-exec/src/main/java/org/apache/drill/exec/vector/BitVector.java 
 10bdf07 
   exec/java-exec/src/main/java/parquet/hadoop/ColumnChunkIncReadStore.java 
 6337d4c 
   exec/java-exec/src/test/java/org/apache/drill/TestAllocationException.java 
 051ad4e 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/memory/TestAllocators.java 
 74ce225 
 
 Diff: https://reviews.apache.org/r/36103/diff/
 
 
 Testing
 ---
 
 unit tests are passing, along with functional and tpch100
 
 
 Thanks,
 
 abdelhakim deneche

Re: Review Request 36070: DRILL-3243: Need a better error message - Use of alias in window function definition

2015-07-01 Thread Hanifi Gunes


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36070/#review90194
---

Ship it!


- Hanifi Gunes


On June 30, 2015, 11:20 p.m., abdelhakim deneche wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36070/
 ---
 
 (Updated June 30, 2015, 11:20 p.m.)
 
 
 Review request for drill and Hanifi Gunes.
 
 
 Bugs: DRILL-3243
 https://issues.apache.org/jira/browse/DRILL-3243
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 improved error message and added a unit test
 
 
 Diffs
 -
 
   common/src/main/java/org/apache/drill/common/exceptions/UserException.java 
 13c17bd 
   
 common/src/main/java/org/apache/drill/common/exceptions/UserRemoteException.java
  1b3fa42 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/CompliantTextRecordReader.java
  254e0d8 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/RepeatedVarCharOutput.java
  40276f4 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/store/text/TestNewTextReader.java
  76674f9 
 
 Diff: https://reviews.apache.org/r/36070/diff/
 
 
 Testing
 ---
 
 all unit tests are passing along with functional and tpch100
 
 
 Thanks,
 
 abdelhakim deneche

Re: Review Request 35887: DRILL-3312: PageReader.allocatePageData() calls BufferAllocator.buffer(int) but doesn't check if the result is null

2015-07-01 Thread Hanifi Gunes


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35887/#review90195
---



exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/PageReader.java
 (line 305)
https://reviews.apache.org/r/35887/#comment143197

This change does not seem needed after 3445. Should we discard it?


- Hanifi Gunes


On June 25, 2015, 7:38 p.m., abdelhakim deneche wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35887/
 ---
 
 (Updated June 25, 2015, 7:38 p.m.)
 
 
 Review request for drill and Hanifi Gunes.
 
 
 Bugs: DRILL-3312
 https://issues.apache.org/jira/browse/DRILL-3312
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 made sure all calls to BufferAllocator.buffer() throw an OOM if the buffer is 
 null
 
 
 Diffs
 -
 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/PageReader.java
  8c73b2a 
 
 Diff: https://reviews.apache.org/r/35887/diff/
 
 
 Testing
 ---
 
 all unit tests are passing along with functional and tpch100
 
 
 Thanks,
 
 abdelhakim deneche

Re: SchemaFactory.logger

2015-07-01 Thread Hanifi Gunes

@Daniel, add BufferAllocator interface as well.

On Tue, Jun 30, 2015 at 4:58 PM, Daniel Barclay dbarc...@maprtech.com
wrote:

 Jacques Nadeau wrote:

 Probably my fault.

 The most common reason is because it used to be an implementation and then
 got turned into an implementation.

 Agreed that they should be removed.

 Roger.  I created DRILL-3439 
 https://issues.apache.org/jira/browse/DRILL-3439Some interfaces have
 logger fields. https://issues.apache.org/jira/browse/DRILL-3439

 Daniel



 On Tue, Jun 30, 2015 at 2:42 PM, Daniel Barclay dbarc...@maprtech.com
 wrote:

  Why does _interface_ org.apache.drill.exec.store.SchemaFactory declare a
 logger?

 Daniel
 --
 Daniel Barclay
 MapR Technologies



 --
 Daniel Barclay
 MapR Technologies

[jira] [Created] (DRILL-3441) CompliantTextRecordReader#isStarQuery loops indefinitely

2015-06-30 Thread Hanifi Gunes (JIRA)

Hanifi Gunes created DRILL-3441:
---

 Summary: CompliantTextRecordReader#isStarQuery loops indefinitely
 Key: DRILL-3441
 URL: https://issues.apache.org/jira/browse/DRILL-3441
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Text  CSV
Reporter: Hanifi Gunes
Assignee: Hanifi Gunes


The implementation recurse into itself and never terminates. We should fix this.

{code}
  @Override
  public boolean isStarQuery() {
if(settings.isUseRepeatedVarChar()){
  ...
}else{
  return isStarQuery();
}
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSSION] should BufferAllocator.buffer() throw an OutOfMemoryException ?

2015-06-30 Thread Hanifi GUNES

- We would end up having to add it to almost everything everywhere
Why would one propagate the checked exception for no reason? And why would
one not propagate the exception for a good reason like robustness? I agree
that one has to avoid propagating the checked exception for no reason
however I support propagating it for a good reason.

The added benefit of raising a checked exception is reminding as well as
enforcing devs to handle it and be more cautious about this particular
event. I find this compile-time safety check invaluable for robustness.

- Or constantly wrapping it with RuntimeException to get around that
If it has to be done, I would recommend relying on a helper to do so. There
is not much of man-work involved here.

2015-06-30 13:53 GMT-07:00 Abdel Hakim Deneche adene...@maprtech.com:

+1 to Hanifi's

On Tue, Jun 30, 2015 at 1:38 PM, Jason Altekruse altekruseja...@gmail.com

wrote:

+1 to Hanifi's comments, I think it makes much more sense to have a
number
of sites where the operators are explicitly catching a checked OOM
exception and either decide to handle it or produce a message like Hash
Aggregate does not support our of memory conditions. This would be
particularly useful for debugging queries, as the user exception can
provide context information about the current operation. This way users
can
have some idea about the part of their query that might be causing an
excessive strain on system resources. I understand that there are also
cases where operators competing for memory can make it a toss up to which
will actually fail, but this would at least be a step to give more
detailed
information to users.

On Tue, Jun 30, 2015 at 1:28 PM, Hanifi GUNES h...@apache.org wrote:

I would propose throwing a checked exception encouraging explicit and
consistent handling of this event. Each sub-system has liberty to
decide
if
an OOM failure is fatal or non-fatal depending on its capabilities.
Also
if
at some point a sub-system needs to communicate with its callers via a
different mechanism such like using flags (boolean, enum etc) or
raising
an
unchecked exception that's still fine, just handle the exception. If
there
is a need to suppress the checked exception that's fine too, just use a
helper method.

Either way, returning *null* sounds problematic in many ways i) it is
implicit ii) unsafe iii) its handling logic is repetitive iv) it is
semantically unclean to make null mean something - even worse something
context specific.

-Hanifi

2015-06-30 12:23 GMT-07:00 Abdel Hakim Deneche adene...@maprtech.com
:

I guess that would fix the issue too. But we may still run into
situations
where the caller will pass a flag to mute the exception and not
handle
the case anyway.

If .buffer() unconditionally throws an exception, can't the caller,
who
wants to, just catch that and handle it properly ?

On Tue, Jun 30, 2015 at 12:13 PM, Chris Westin
chriswesti...@gmail.com
wrote:

No, but we should do something close to that.

There are cases where the caller can handle the inability to get
more
memory, and may be able to go to disk. However, you are correct
that
there
are many that can't handle an OOM, and that fail to check.

Instead of unconditionally throwing OutOfMemoryRuntimeException, I
would
suggest that the buffer() call take a flag that indicates whether
or
not
it
should throw if it is unable to fulfill the request. This way, the
call
sites that can handle an OOM can pass in the flag to return null,
and
the
rest can pass in the flag value to throw, and not have to have any
checking
code.

On Tue, Jun 30, 2015 at 12:06 PM, Abdel Hakim Deneche
adene...@maprtech.com
wrote:

our current implementations of BufferAllocator.buffer(int, int)
returns
null when it cannot allocate the buffer.

But looking through the code, there are many places that don't
check
if
the
allocated buffer is null before trying to access it which will
throw
a
NullPointerException.

ValueVectors' allocateNewSafe() seem to be the only place that
handle
the
null in a specific manner.

Should we update the allocators' implementation to throw an
OutOfMemoryRuntimeException instead of returning null ? this has
the
added
benefit of displaying a proper out of memory error message to the
user.

Thanks!

Abdelhakim Deneche

Software Engineer

http://www.mapr.com/

Now Available - Free Hadoop On-Demand Training

http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available

Abdelhakim Deneche

Re: [DISCUSSION] should BufferAllocator.buffer() throw an OutOfMemoryException ?

2015-06-30 Thread Hanifi Gunes

,
someone should go through and make an example changeset for everyone
 to
review.  My guess is, the only that it works will be if very quickly,
  the
handling code converts the checked exception into an unchecked
UserException.  But I'm more than happy to be proven wrong.
   
On Tue, Jun 30, 2015 at 2:16 PM, Daniel Barclay 
 dbarc...@maprtech.com
  
wrote:
   
 Hanifi GUNES wrote:

 - We would end up having to add it to almost everything everywhere
 Why would one propagate the checked exception for no reason? And
 why
would
 one not propagate the exception for a good reason like
 robustness? I
agree
 that one has to avoid propagating the checked exception for no
  reason
 however I support propagating it for a good reason.

 The added benefit of raising a checked exception is reminding as
  well
   as
 enforcing devs to handle it and be more cautious about this
  particular
 event. I find this compile-time safety check invaluable for
   robustness.


 +(1 times some large number)

 Daniel




 - Or constantly wrapping it with RuntimeException to get around
 that
 If it has to be done, I would recommend relying on a helper to do
  so.
 There
 is not much of man-work involved here.


 2015-06-30 13:53 GMT-07:00 Abdel Hakim Deneche 
  adene...@maprtech.com
   :

  +1 to Hanifi's

 On Tue, Jun 30, 2015 at 1:38 PM, Jason Altekruse 
 altekruseja...@gmail.com


  wrote:

  +1 to Hanifi's comments, I think it makes much more sense to
 have
  a

 number

 of sites where the operators are explicitly catching a checked
 OOM
 exception and either decide to handle it or produce a message
 like
Hash
 Aggregate does not support our of memory conditions. This would
  be
 particularly useful for debugging queries, as the user exception
  can
 provide context information about the current operation. This
 way
users

 can

 have some idea about the part of their query that might be
 causing
   an
 excessive strain on system resources. I understand that there
 are
   also
 cases where operators competing for memory can make it a toss up
  to
 which
 will actually fail, but this would at least be a step to give
 more

 detailed

 information to users.

 On Tue, Jun 30, 2015 at 1:28 PM, Hanifi GUNES h...@apache.org
   wrote:

  I would propose throwing a checked exception encouraging
 explicit
   and
 consistent handling of this event. Each sub-system has liberty
 to

 decide

 if

 an OOM failure is fatal or non-fatal depending on its
  capabilities.

 Also

 if

 at some point a sub-system needs to communicate with its
 callers
   via
a
 different mechanism such like using flags (boolean, enum etc)
 or

 raising

 an

 unchecked exception that's still fine, just handle the
 exception.
   If

 there

 is a need to suppress the checked exception that's fine too,
 just
use a
 helper method.

 Either way, returning *null* sounds problematic in many ways i)
  it
   is
 implicit ii) unsafe iii) its handling logic is repetitive iv)
 it
  is
 semantically unclean to make null mean something - even worse
something
 context specific.


 -Hanifi

 2015-06-30 12:23 GMT-07:00 Abdel Hakim Deneche 
adene...@maprtech.com

 :


  I guess that would fix the issue too. But we may still run
 into

 situations

 where the caller will pass a flag to mute the exception and
  not

 handle

 the case anyway.

 If .buffer() unconditionally throws an exception, can't the
   caller,

 who

 wants to, just catch that and handle it properly ?

 On Tue, Jun 30, 2015 at 12:13 PM, Chris Westin 

 chriswesti...@gmail.com

 wrote:

  No, but we should do something close to that.

 There are cases where the caller can handle the inability to
  get

 more

 memory, and may be able to go to disk. However, you are correct

 that

 there

 are many that can't handle an OOM, and that fail to check.

 Instead of unconditionally throwing
  OutOfMemoryRuntimeException,
   I

 would

 suggest that the buffer() call take a flag that indicates
  whether

 or

 not

 it

 should throw if it is unable to fulfill the request. This
 way,
   the

 call

 sites that can handle an OOM can pass in the flag to return
 null,

 and

 the

 rest can pass in the flag value to throw, and not have to have
  any

 checking

 code.


 On Tue, Jun 30, 2015 at 12:06 PM, Abdel Hakim

Re: Review Request 35942: DRILL-1673 (reopened) : issue with flatten when used with nested lists

2015-06-26 Thread Hanifi Gunes


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35942/#review89567
---



exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/EmptyValuePopulator.java
 (line 46)
https://reviews.apache.org/r/35942/#comment142191

This seems to fix the symptom. The only case I can think of that would 
require this ternary statement here is when offsets vector has zero elements, 
which breaks the invariant of this class. What is causing offsets vector to 
come empty?


- Hanifi Gunes


On June 26, 2015, 8:35 p.m., Jason Altekruse wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35942/
 ---
 
 (Updated June 26, 2015, 8:35 p.m.)
 
 
 Review request for drill and Venki Korukanti.
 
 
 Bugs: DRILL-1673
 https://issues.apache.org/jira/browse/DRILL-1673
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 There were two small issues in ValueVectors that caused this to fail. This 
 patches fixes those issues and a few new flatten tests. To validate the 
 flatten results a small reference implementation was added to generate 
 baselines.
 
 
 Diffs
 -
 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/EmptyValuePopulator.java
  8c61a60 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedListVector.java
  f538399 
   exec/java-exec/src/test/java/org/apache/drill/DrillTestWrapper.java d4e7ed6 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/flatten/TestFlatten.java
  6f5a303 
   
 exec/java-exec/src/test/resources/flatten/complex_transaction_example_data.json
  PRE-CREATION 
   exec/java-exec/src/test/resources/store/json/1673.json PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/35942/diff/
 
 
 Testing
 ---
 
 Unit tests run, regression in progress
 
 
 Thanks,
 
 Jason Altekruse

Re: Review Request 35609: DRILL-3243: Need a better error message - Use of alias in window function definition

2015-06-23 Thread Hanifi Gunes


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35609/#review89036
---

Ship it!


Ship It!

- Hanifi Gunes


On June 18, 2015, 3:14 p.m., abdelhakim deneche wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35609/
 ---
 
 (Updated June 18, 2015, 3:14 p.m.)
 
 
 Review request for drill and Hanifi Gunes.
 
 
 Bugs: DRILL-3243
 https://issues.apache.org/jira/browse/DRILL-3243
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 changed RepeatedVarCharOutput to display the column name as part of the 
 exception's message
 changed CompliantTestRecordReader to throw a DATA_READ user exception
 added new unit test to TestNewTextReader
 
 
 Diffs
 -
 
   common/src/main/java/org/apache/drill/common/exceptions/UserException.java 
 6f28a2b 
   
 common/src/main/java/org/apache/drill/common/exceptions/UserRemoteException.java
  1b3fa42 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/CompliantTextRecordReader.java
  254e0d8 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/RepeatedVarCharOutput.java
  40276f4 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/store/text/TestNewTextReader.java
  76674f9 
 
 Diff: https://reviews.apache.org/r/35609/diff/
 
 
 Testing
 ---
 
 all unit tests are passing along with functional and tpch100
 
 
 Thanks,
 
 abdelhakim deneche

Re: Review Request 35623: DRILL-2494: Have PreparedStmt. set-param. methods throw unsupported.

2015-06-23 Thread Hanifi Gunes


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35623/#review89035
---

Ship it!


Ship It!

- Hanifi Gunes


On June 19, 2015, 7 p.m., Daniel Barclay wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35623/
 ---
 
 (Updated June 19, 2015, 7 p.m.)
 
 
 Review request for drill, Hanifi Gunes and Mehant Baid.
 
 
 Bugs: DRILL-2494
 https://issues.apache.org/jira/browse/DRILL-2494
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 Added (integration-level) unit test.
 
 Modified set-parameter methods to throw SQLFeatureNotSupportedException.
 (Intercepted common getParameter method.)
 
 Inserted DrillPreparedStatement into hierarchy for place for documentation.
 
 Documented that parameter-setting methods are not supported.
 
 
 Diffs
 -
 
   exec/jdbc/src/main/java/org/apache/drill/jdbc/DrillPreparedStatement.java 
 PRE-CREATION 
   
 exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillPreparedStatementImpl.java
  5e9ec93 
   exec/jdbc/src/test/java/org/apache/drill/jdbc/PreparedStatementTest.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/35623/diff/
 
 
 Testing
 ---
 
 Added specific unit tests.
 
 Ran existing tests; no new failures.
 
 
 Thanks,
 
 Daniel Barclay

Re: [DISCUSS] Allowing the option to use github pull requests in place of reviewboard

2015-06-23 Thread Hanifi Gunes

+1

At the very least GitHub will be UP.

On Tue, Jun 23, 2015 at 2:18 PM, Parth Chandra pchan...@maprtech.com
wrote:

 +1 on trying this. RB has been pretty painful to us.



 On Mon, Jun 22, 2015 at 9:45 PM, Matthew Burgess mattyb...@gmail.com
 wrote:

  Is Travis https://travis-ci.org/  a viable option for the GitHub
 route?
  I
  use it for my own projects to build pull requests (with additional code
  quality targets like CheckStyle, PMD, etc.). Perhaps that would take some
  of
  the burden off the reviewers and let them focus on the proposed
  implementations, rather than some of the more tedious aspects of each
  review.
 
  From:  Jacques Nadeau jacq...@apache.org
  Reply-To:  dev@drill.apache.org
  Date:  Monday, June 22, 2015 at 10:22 PM
  To:  dev@drill.apache.org dev@drill.apache.org
  Subject:  Re: [DISCUSS] Allowing the option to use github pull requests
 in
  place of reviewboard
 
  I'm up for this if we deprecate the old way.  Having two different
  processes seems like overkill.  In general, I find the review interface
 of
  GitHub less expressive/clear but everything else is way better.
 
  On Mon, Jun 22, 2015 at 6:59 PM, Steven Phillips sphill...@maprtech.com
 
  wrote:
 
+1
  
I am in favor of giving this a try.
  
If I remember correctly, the reason we abandoned pull requests
  originally
was because we couldn't close the pull requests through Github. A
  solution
could be for whoever pushes the commit to the apache git repo to add
 the
Line Closes request number. Github would then automatically close
  the
pull request.
  
On Mon, Jun 22, 2015 at 1:02 PM, Jason Altekruse 
  altekruseja...@gmail.com

wrote:
  
 Hello Drill developers,

 I am writing this message today to propose allowing the use of
 github
pull
 requests to perform reviews in place of the apache reviewboard
  instance.

 Reviewboard has caused a number of headaches in the past few
 months,
  and
I
 think its time to evaluate the benefits of the apache
 infrastructure
 relative to the actual cost of using it in practice.

 For clarity of the discussion, we cannot use the complete github
workflow.
 Comitters will still need to use patch files, or check out the
 branch
used
 in the review request and push to apache master manually. I am not
 advocating for using a merging strategy with git, just for using
 the
github
 web UI for reviews. I expect anyone generating a chain of commits
 as
 described below to use the rebasing workflow we do today.
  Additionally
devs
 should only be breaking up work to make it easier to review, we
 will
  not
be
 reviewing branches that contain a bunch of useless WIP commits.

 A few examples of problems I have experienced with reviewboard
  include:
 corruption of patches when they are downloaded, the web interface
  showing
 inconsistent content from the raw diff, and random rejection of
  patches
 that are based directly on the head of apache master.

 These are all serious blockers for getting code reviewed and
  integrated
 into the master branch in a timely manner.

 In addition to serious bugs in reviewboard, there are a number of
 difficulties with the combination of our typical dev workflow and
 how
 reviewboard works with patches. As we are still adding features to
  Drill,
 we often have several weeks of work to submit in response to a JIRA
  or
 series of related JIRAs. Sometimes this work can be broken up into
 independent reviewable units, and other times it cannot. When a
  series of
 changes requires a mixture of refactoring and additions, the
 process
  is
 currently quite painful. Ether reviewers need to look through a
 giant
messy
 diff, or the submitters need to do a lot of extra work. This
  involves not
 only organizing their work into a reviewable series of commits, but
  also
 generating redundant squashed versions of the intermediate work to
  make
 reviewboard happy.

 For a relatively simple 3 part change, this involves creating 3
reviewboard
 pages. The first will contain the first commit by itself. The
 second
  will
 have the first commits patch as a parent patch with the next change
  in
the
 series uploaded as the core change to review. For the third
 change, a
 squashed version of the first two commits must be generated to
 serve
  as a
 parent patch and then the third changeset uploaded as the
 reviewable
 change. Frequently a change to the first commit requires
  regenerating all
 of these patches and uploading them to the individual review pages.

 This gets even worse with larger chains of commits.

 It would be great if all of our changes could be small units of
  work, but
 very frequently we want to make sure we are ready to merge a
 complete
 feature before

Re: Review Request 35484: DRILL-2851: set an upper-bound on # of bytes to re-allocate to prevent overflows

2015-06-22 Thread Hanifi Gunes


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35484/
---

(Updated June 23, 2015, 1:20 a.m.)


Review request for drill, Mehant Baid and Venki Korukanti.


Changes
---

Added unit tests to cover various reallocation scenarios.
Fixed unit tests to avoid leaks via closing out resources.
VVs throw OversizedAllocationException if allocation demand is more than the 
allowed max.
Ensure that flatten handles OversizedAllocationException to split the batch and 
resume the execution.


Repository: drill-git


Description
---

DRILL-2851: set an upper-bound on # of bytes to re-allocate to prevent overflows
Vectors
- set an upper bound on # of bytes to allocate
- 
TestValueVector.java
- Add unit tests


Diffs (updated)
-

  exec/java-exec/src/main/codegen/includes/vv_imports.ftl 
92c80072cfcde4deb0bbb34bc3b688707541f2f6 
  exec/java-exec/src/main/codegen/templates/FixedValueVectors.java 
7103a17108693d47839212c418d11d13fbb8f6f4 
  exec/java-exec/src/main/codegen/templates/NullableValueVectors.java 
7f835424b68a9d68b0a6c60749677a83ac486590 
  exec/java-exec/src/main/codegen/templates/VariableLengthVectors.java 
50ae770f24aff1e8eed1dfa800878ce92308c644 
  
exec/java-exec/src/main/java/org/apache/drill/exec/exception/OversizedAllocationException.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java
 999140498ab303d3f5ecf20695755bdfe943cb46 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenTemplate.java
 de67b62248a68c1f483808c4b575e0afa7854aca 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/Flattener.java
 92cf79d37da89864ab7702830fe078479773a73e 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/BaseDataValueVector.java
 0e38f3cad3792e936ff918ae970f4b40e478d516 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/BaseValueVector.java 
8129668b6ff5dc674e30dca6947bd93c87fb4d3d 
  exec/java-exec/src/main/java/org/apache/drill/exec/vector/BitVector.java 
10bdf0752632c7577b9a6eb445c7101ec1a24730 
  
exec/java-exec/src/test/java/org/apache/drill/exec/record/vector/TestValueVector.java
 037c8c6d3da94acf5c2ca300ce617338cacb0fb0 

Diff: https://reviews.apache.org/r/35484/diff/


Testing
---

all


Thanks,

Hanifi Gunes

Re: Review Request 35484: DRILL-2851: set an upper-bound on # of bytes to re-allocate to prevent overflows

2015-06-19 Thread Hanifi Gunes



 On June 19, 2015, 5:06 p.m., Jason Altekruse wrote:
  exec/java-exec/src/main/codegen/templates/FixedValueVectors.java, line 92
  https://reviews.apache.org/r/35484/diff/1/?file=985297#file985297line92
 
  The reallocation loops will run infinitely if we have it the maximum 
  buffer size. This can be reproduced by reading a JSON file with a lot of 
  large lists. This could be fixed by re-intriducing the code we had before 
  re-allocation, but only in the case where we hit one of these limits. 
  Allocations should be able to fail, and we need to make sure the operators 
  can handle this case.
  
  We should fail with an OOM error in these cases until we can split 
  lists across batches or find another way around the fact that the allocator 
  tracks buffer lengths with ints.
 
 Jason Altekruse wrote:
 *hit the maximum buffer size,   *re-introducing the code
 
 Hanifi Gunes wrote:
 I doubt if it is a good idea to fail with OOM. Instead we should consider 
 setting the upper bound to max allowed by the allocator (Int.MAX) and ensure 
 that this invariant holds across vector impls.
 
 Jason Altekruse wrote:
 The problem is that the consumers of this interface are currently working 
 under the assumption that a request to write to a vector will not fail unless 
 we run out of memory. We need to go back and update the operators to stop 
 their processing when we cannot do a write into the vector, or emulate the 
 case they expect, which is OOM.

Sounds good but an operation on the buffer that goes beyond its boundaries does 
not translate into OOM but perhaps to an IOOB. It is only when allocator cannot 
supply an allocation demand that we should expect an OOM. I am inclined to 
throw an IOOB here for the case we go beyond the max buffer capacity.


- Hanifi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35484/#review88551
---


On June 17, 2015, 6:54 p.m., Hanifi Gunes wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35484/
 ---
 
 (Updated June 17, 2015, 6:54 p.m.)
 
 
 Review request for drill, Mehant Baid and Venki Korukanti.
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 DRILL-2851: set an upper-bound on # of bytes to re-allocate to prevent 
 overflows
 Vectors
 - set an upper bound on # of bytes to allocate
 - 
 TestValueVector.java  
 - Add unit tests
 
 
 Diffs
 -
 
   exec/java-exec/src/main/codegen/templates/FixedValueVectors.java 
 7103a17108693d47839212c418d11d13fbb8f6f4 
   exec/java-exec/src/main/codegen/templates/VariableLengthVectors.java 
 bd41e10d3f69e13d0f8c426460af5e9a09d93fd9 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/BaseValueVector.java
  ec409a3fc59616708226aa500ccab1680cd261f6 
   exec/java-exec/src/main/java/org/apache/drill/exec/vector/BitVector.java 
 10bdf0752632c7577b9a6eb445c7101ec1a24730 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/record/vector/TestValueVector.java
  037c8c6d3da94acf5c2ca300ce617338cacb0fb0 
 
 Diff: https://reviews.apache.org/r/35484/diff/
 
 
 Testing
 ---
 
 all
 
 
 Thanks,
 
 Hanifi Gunes

[jira] [Created] (DRILL-3313) Eliminate redundant #load methods and unit-test loading exporting of vectors

2015-06-18 Thread Hanifi Gunes (JIRA)

Hanifi Gunes created DRILL-3313:
---

 Summary: Eliminate redundant #load methods and unit-test loading  
exporting of vectors
 Key: DRILL-3313
 URL: https://issues.apache.org/jira/browse/DRILL-3313
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Execution - Data Types
Affects Versions: 1.0.0
Reporter: Hanifi Gunes
Assignee: Hanifi Gunes


Vectors have multiple #load methods that are used to populate data from raw 
buffers. It is relatively tough to reason, maintain and unit-test loading and 
exporting of data since there is many redundant code around load methods. This 
issue proposes to have single #load method conforming to VV#load(def, buffer) 
signature eliminating all other #load overrides.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 35573: DRILL-3304: improve org.apache.drill.exec.expr.TypeHelper error messages when UnsupportedOprationException is thrown

2015-06-17 Thread Hanifi Gunes


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35573/#review88285
---

Ship it!


Ship it once comments are addressed.


exec/java-exec/src/main/codegen/templates/TypeHelper.java (line 471)
https://reviews.apache.org/r/35573/#comment140728

Would you use buildErrMsg here as well?



exec/java-exec/src/main/codegen/templates/TypeHelper.java (line 494)
https://reviews.apache.org/r/35573/#comment140729

as well as all others that follow


- Hanifi Gunes


On June 17, 2015, 8:30 p.m., abdelhakim deneche wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35573/
 ---
 
 (Updated June 17, 2015, 8:30 p.m.)
 
 
 Review request for drill and Hanifi Gunes.
 
 
 Bugs: DRILL-3304
 https://issues.apache.org/jira/browse/DRILL-3304
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 Made some changes to TypeHelper template to display the failed operation 
 and minor-type + data-mode when an UnsupportedOperationException is thrown
 
 
 Diffs
 -
 
   common/src/main/java/org/apache/drill/common/exceptions/ErrorHelper.java 
 5dd9b67 
   exec/java-exec/src/main/codegen/templates/TypeHelper.java ad818bd 
 
 Diff: https://reviews.apache.org/r/35573/diff/
 
 
 Testing
 ---
 
 all unit tests are passing along with functional/tpch100
 
 
 Thanks,
 
 abdelhakim deneche

Re: [DISCUSS] Making the drill codebase easier to unit test

2015-06-17 Thread Hanifi Gunes

Some sub-systems that I know of, particularly around readers, writers, VVs
and operators are not unit-testing friendly by design: First, they involve
much more logic than one could define as a unit. Second, it is relatively
tough if not impossible to control their behavior, mock or inject
dependencies because they are tightly coupled with other parts of the
system. I would propose starting off with very fundamental yet minor code
refactoring that aims to have self-contained, cohesive pieces abstracted
away so that we could get these unit-tested first. Applying this
idea iteratively should bring better test coverage. Then we can focus on
testing operators or other components that rely on these well tested units.
Either way I would prefer a piece-meal approach rather than trying to
unit-test an entire sub-system.

-Hanifi

On Wed, Jun 17, 2015 at 1:53 PM, Abdel Hakim Deneche adene...@maprtech.com
wrote:

I don't know much work this involves (it seems a lot!) but this would be
really useful. Like you said, with the current model coming up with good
unit tests can be really tricky especially when testing the edge cases, and
the worst part is that any changes to how queries are planned or for
example the size of the batches can make some tests useless.

On Tue, Jun 16, 2015 at 12:38 PM, Jason Altekruse
altekruseja...@gmail.com
wrote:

Hello Drill devs,

I would like to propose a proactive effort to make the Drill codebase
easier to unit test.
Many JIRAs have been created for bugs that should have been prevented by
better unit testing, and we are still fixing these kinds of bugs today as
they crop up. I have a few ideas, and I plan on creating JIRAs for
specific
refactoring and test infrastructure improvements. Before I do, I would
like
to collect thoughts from everyone on what can get us the most benefit for
our work.

As a short overview of the situation today, most of the tests in Drill
take
the form of running a SQL query on a local drillbit and verifying the
results. Plenty of times this has been described as more of integration
testing than unit testing, and it has caused several common testing pains
and gaps.

1. batch boundaries - as we cannot control where batches are cut off
during
the query, complete queries often make it hard to test different
scenarios
processing an incoming stream of data with given properties.
- examples of issues: inconsistent behavior between operators,
some
operators have failed to handle empty batches, or a batch full
of nulls
until we wrote a test that happened to have the right input
file
and plan to
produce these scenarios
2. Valid planning changes can end up making tests previously designed to
test execution fail in new ways as the data will now flow differently
through the operators
3. SQL queries as test specifications make it hard to test everything,
all types, all possible data properties/structures, all possible switches
flipped in the planner or configuration for an operator

I would like to start the discussion with a proposal to fix some of these
problems. We need a way to run an operator easily in isolation. Possible
steps to achieve this include, a new operator that will produce data in
explicitly provided batches, that can be configured from a test. This can
serve as a universal input to unit test operators. We would also need
some
way to consume and verify the output of the operators. This could share
code with the current query execution, or possibly side step it to avoid
having to mock or instantiate the whole query context.

This proposal itself is testing a relatively large part of the system as
a
whole unit. I would be interested to hear opinions on the utility vs
extra effort of trying to refactor more classes so that they can be
created
in tests and have their individual methods tested. This is already being
done for some classes like the value vectors, but it is far from
exhaustive. I don't expect us to start rigidly enforcing this level of
testing granularity everywhere, but there are components of the system
that
really need to be resilient and be guaranteed to stay that way as the
project evolves.

Please chime in with your thoughts.

Abdelhakim Deneche

Software Engineer

http://www.mapr.com/

Now Available - Free Hadoop On-Demand Training

http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available

Re: Review Request 35329: DRILL-2997: Remove references to groupCount from SerializedField

2015-06-15 Thread Hanifi Gunes


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35329/
---

(Updated June 15, 2015, 11:12 p.m.)


Review request for drill, Mehant Baid and Parth Chandra.


Changes
---

addressed the feedback. remove unused import. refact a test case to use a field 
instance.


Repository: drill-git


Description
---

DRILL-2997: Remove references to groupCount from SerializedField

- Remove references to group count where applicable and adapt vectors to work 
with the changes.
- Fix misc test cases

RepeatedValueVectors
- get rid of multiple #load methods and rely on load(metadata, buffer) 
regardless of the vector type.

BaseValueVector
- all vector must have a field

VariableLengthVectors
- all vector must have a field so does offsets vector

MapVector
- refactor #load for better code readability and consistency

RepeatedFixedWidthVectorLike  RepeatedVariableWidthVectorLike
- #load is un-needed now


Diffs (updated)
-

  exec/java-exec/src/main/codegen/templates/RepeatedValueVectors.java 
12dce2596a7d590d2d33c85fca5c47acb2495a25 
  exec/java-exec/src/main/codegen/templates/VariableLengthVectors.java 
bd41e10d3f69e13d0f8c426460af5e9a09d93fd9 
  
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchLoader.java
 de6f66598927bee5fe3afb1d1046cc20c136b424 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/BaseValueVector.java 
ec409a3fc59616708226aa500ccab1680cd261f6 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/AbstractMapVector.java
 af364bd5487eb510b2ffc8219a37611b828136e5 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/BaseRepeatedValueVector.java
 f292e4c22f07a65036dbdae2cbe809e00758faf6 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapVector.java
 d0f38c2a397aac7eaad247c39b4b856c89c970a0 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedFixedWidthVectorLike.java
 fb7ed2a975e095b1f49d3a24cd735b8c7551c7f1 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedListVector.java
 f6d3d88ca32a6a3320ec317dbe99c0031f0d54c6 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMapVector.java
 4617ede4aaaebe36dbe9f50dccbff107351dc40f 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedVariableWidthVectorLike.java
 c57143e24e2c60ba400c4f3fcfb9c012e5e25a89 
  exec/java-exec/src/test/java/org/apache/drill/exec/expr/ExpressionTest.java 
239a099e66a0a93ce5b517f67b6b20bb5d4b906a 
  
exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/TestEmptyPopulation.java
 06a73e22b981d5b8ae7289c171afa703eab91788 
  
exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/fn/TestJsonReaderWithSparseFiles.java
 544b962142e24f6c185ad91384d6cb270776acb3 
  protocol/src/main/java/org/apache/drill/exec/proto/SchemaUserBitShared.java 
bee2a3dfac6325460e9902d901b72a6c72cb7d81 
  protocol/src/main/java/org/apache/drill/exec/proto/UserBitShared.java 
92afa4f4b6fc223fb9179887d808c3f2925f303b 
  protocol/src/main/java/org/apache/drill/exec/proto/beans/SerializedField.java 
699097a0ab45468b90ec8f5141108c844dbfadf1 
  protocol/src/main/protobuf/UserBitShared.proto 
68c8612dadfc5cbe4a8157ec516ddf7246f5b956 

Diff: https://reviews.apache.org/r/35329/diff/


Testing
---

unit + regression


Thanks,

Hanifi Gunes

[jira] [Created] (DRILL-3282) Example in workspace documentation is invalid

2015-06-11 Thread Hanifi Gunes (JIRA)

Hanifi Gunes created DRILL-3282:
---

 Summary: Example in workspace documentation is invalid
 Key: DRILL-3282
 URL: https://issues.apache.org/jira/browse/DRILL-3282
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.0.0
Reporter: Hanifi Gunes
Assignee: Bridget Bevens


Example workspaces given in [file system storage 
plugin|http://drill.apache.org/docs/file-system-storage-plugin/] is wrong in 
that configuration entries are case-sensitive and should have been typed in 
camel case. This rule applies to storage plugin configurations in general.

{code:title=an invalid example workspace definition from documentation}
root: {
  location: /user/max/donuts,
  writable: false,
  defaultinputformat: null // invalid enty
 }
{code}

{code:title=working workspace definition with entry names camel cased}
root: {
  location: /user/max/donuts,
  writable: false,
  defaultInputFormat: null // camel case
 }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 35329: DRILL-2997: Remove references to groupCount from SerializedField

2015-06-10 Thread Hanifi Gunes


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35329/
---

(Updated June 10, 2015, 11:40 p.m.)


Review request for drill and Parth Chandra.


Changes
---

Remove old code.


Repository: drill-git


Description
---

DRILL-2997: Remove references to groupCount from SerializedField

- Remove references to group count where applicable and adapt vectors to work 
with the changes.
- Fix misc test cases

RepeatedValueVectors
- get rid of multiple #load methods and rely on load(metadata, buffer) 
regardless of the vector type.

BaseValueVector
- all vector must have a field

VariableLengthVectors
- all vector must have a field so does offsets vector

MapVector
- refactor #load for better code readability and consistency

RepeatedFixedWidthVectorLike  RepeatedVariableWidthVectorLike
- #load is un-needed now


Diffs (updated)
-

  exec/java-exec/src/main/codegen/templates/RepeatedValueVectors.java 
12dce2596a7d590d2d33c85fca5c47acb2495a25 
  exec/java-exec/src/main/codegen/templates/VariableLengthVectors.java 
bd41e10d3f69e13d0f8c426460af5e9a09d93fd9 
  
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchLoader.java
 de6f66598927bee5fe3afb1d1046cc20c136b424 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/BaseValueVector.java 
ec409a3fc59616708226aa500ccab1680cd261f6 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/AbstractMapVector.java
 af364bd5487eb510b2ffc8219a37611b828136e5 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/BaseRepeatedValueVector.java
 f292e4c22f07a65036dbdae2cbe809e00758faf6 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapVector.java
 d0f38c2a397aac7eaad247c39b4b856c89c970a0 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedFixedWidthVectorLike.java
 fb7ed2a975e095b1f49d3a24cd735b8c7551c7f1 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedListVector.java
 f6d3d88ca32a6a3320ec317dbe99c0031f0d54c6 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMapVector.java
 4617ede4aaaebe36dbe9f50dccbff107351dc40f 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedVariableWidthVectorLike.java
 c57143e24e2c60ba400c4f3fcfb9c012e5e25a89 
  
exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/TestEmptyPopulation.java
 06a73e22b981d5b8ae7289c171afa703eab91788 
  
exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/fn/TestJsonReaderWithSparseFiles.java
 544b962142e24f6c185ad91384d6cb270776acb3 
  protocol/src/main/java/org/apache/drill/exec/proto/SchemaUserBitShared.java 
bee2a3dfac6325460e9902d901b72a6c72cb7d81 
  protocol/src/main/java/org/apache/drill/exec/proto/UserBitShared.java 
92afa4f4b6fc223fb9179887d808c3f2925f303b 
  protocol/src/main/java/org/apache/drill/exec/proto/beans/SerializedField.java 
699097a0ab45468b90ec8f5141108c844dbfadf1 
  protocol/src/main/protobuf/UserBitShared.proto 
68c8612dadfc5cbe4a8157ec516ddf7246f5b956 

Diff: https://reviews.apache.org/r/35329/diff/


Testing
---

unit + regression


Thanks,

Hanifi Gunes

Re: Apache Drill reviews

2015-06-09 Thread Hanifi Gunes

Rebase your patch on top of master. Take a diff against master and upload
it. This should do it. Good luck.

-Hanifi


On Mon, Jun 8, 2015 at 7:32 PM, Matt Burgess mattyb...@gmail.com wrote:

 I apologize, I've been trying to get an update to my patch for DRILL-3199
 reviewed, but I seem to be screwing up how to update the diff at
 reviews.apache.org.

 Should the updates be incremental commits/patches, or should they be diffs
 against the target branch entirely? Ive tried both but either got something
 wrong with generating the patch from Git commits, or something wrong with
 submitting them to the review.

 Thanks in advance,
 Matt

Re: Build fails on exec/Java Execution Engine step.

2015-06-09 Thread Hanifi Gunes

-DskipTests simply ignores test run but it does not explain why we got
failures. Would you share the logs for failing tests also?

-Hanifi

On Tue, Jun 9, 2015 at 11:23 AM, Chris Westin chriswesti...@gmail.com
wrote:

 That may get you through the build, but it does mean you're having a couple
 of unit test failures, so there might be a setup issue on your machine.

 On Mon, Jun 8, 2015 at 10:30 PM, jhsbeat jhsb...@gmail.com wrote:

 
  Yes. Thanks.
 
  It’s working fine now with `-DskipTests` option.
 
  Thanks. :)
 
  2015. 6. 9., 오후 2:19, Rajkumar Singh rsi...@maprtech.com 작성:
 
   ry maven argument -DskipTests=true ?

Re: Review Request 35030: DRILL-1760: implement count(nested-type)

2015-06-03 Thread Hanifi Gunes


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35030/
---

(Updated June 3, 2015, 10:27 p.m.)


Review request for drill and Mehant Baid.


Changes
---

added description


Repository: drill-git


Description (updated)
---

DRILL-1760: implement count(nested-type)

CountAggrTypes.java  CountAggregateFunctions.java
- Introduced count over nested type

Vectors  readers
- Implemented isSet/isNull to behave as expected since these methods are now 
used by count(complex-type)


Diffs
-

  exec/java-exec/src/main/codegen/data/CountAggrTypes.tdd 
53e25f73ed88846cb05ad95b6aeaa722409f7bd4 
  exec/java-exec/src/main/codegen/templates/CountAggregateFunctions.java 
71ac6a7dc831de9917e9468c2630f2242a576aeb 
  exec/java-exec/src/main/codegen/templates/RepeatedValueVectors.java 
7b2b78d80254abee8d380586fb4be64fee335b24 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/BaseRepeatedValueVector.java
 d5a0d6268378d958c2e8b50826e6b78bd0c1850f 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapVector.java
 d0f38c2a397aac7eaad247c39b4b856c89c970a0 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMapVector.java
 a97847ba07e2543b122009a06eafccf06b89b43a 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/impl/RepeatedListReaderImpl.java
 36e9beedbbb037564962d868b276e5d9d0c14140 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/impl/RepeatedMapReaderImpl.java
 b2fe7b7fc532bfd0b52559864404906107132ea9 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/impl/SingleMapReaderImpl.java
 1b39775f35403ad526756cc7fe5e08d2c393a99e 
  
exec/java-exec/src/test/java/org/apache/drill/exec/expr/fn/impl/TestCountFunctions.java
 PRE-CREATION 
  exec/java-exec/src/test/resources/functions/count-data.json PRE-CREATION 
  exec/java-exec/src/test/resources/parquet/count-data.parquet PRE-CREATION 

Diff: https://reviews.apache.org/r/35030/diff/


Testing
---


Thanks,

Hanifi Gunes

Re: question about correlated arrays and flatten

2015-06-02 Thread Hanifi Gunes

That's right. I guess that's what I am proposing to have here implicitly. I
am not sure how feasible this would be, however, we should be able to
interpret inline lambda like expressions. This is something to discuss as
we improve Drill's complex data handling capabilities. I see a great value
added here - especially for computationally-intense workloads.

select fold(t.numbers, 0, (r, c) = r + c), map(t.numbers, (n) = n*n) from
dfs.`some/table` t

-Hanifi

On Mon, Jun 1, 2015 at 3:28 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 How could we make functional primitives work without lambda?



 On Mon, Jun 1, 2015 at 9:55 PM, Hanifi Gunes hgu...@maprtech.com wrote:

  Idea of having functional primitives with Drill sounds really handy. It
  would be great if we could support left-right folding as well. I can see
  many great use cases of project/map, fold/reduce, zip, flatten when
  combined.
 
  On Sat, May 30, 2015 at 12:57 AM, Ted Dunning ted.dunn...@gmail.com
  wrote:
 
   OK.  I will file a JIRA for a zip function.  No idea if I will be able
 to
   get one written in the available cracks of time.
  
  
  
   On Fri, May 29, 2015 at 7:17 PM, Steven Phillips 
 sphill...@maprtech.com
  
   wrote:
  
I think your use case could be solved by adding a UDF that can
 combine
multiple arrays into a single array. The result of this function
 could
   then
be handled by our current implementation of flatten.
   
I think this is preferable to enhancing flatten itself to handle it,
   since
flatten is not an ordinary UDF, and thus more difficult to modify and
maintain.
   
On Fri, May 29, 2015 at 3:20 PM, Ted Dunning ted.dunn...@gmail.com
wrote:
   
 My particular use case can throw an error if the lists are
 different
 length.

 I think our real goal should be to have a logically complete set of
simple
 primitives that lets any sort of back and forward conversions of
 this
kind.




 On Fri, May 29, 2015 at 9:58 AM, Jason Altekruse 
altekruseja...@gmail.com
 
 wrote:

  I understand what you want to do, unfortunately we don't have
  support
for
  this right now. A UDF is the best I can suggest at this point.
 
  Just to explore the idea a little further for the sake of
 creating
  a
  complete feature request, I assume you would just want nulls
 filled
   in
 for
  the cases where the lists were different lengths?
 
  On Fri, May 29, 2015 at 8:58 AM, Ted Dunning 
  ted.dunn...@gmail.com
  wrote:
 
   Input is here:
   https://gist.github.com/tdunning/07ce66e7e4d4af41afd7
  
   Output is here:
https://gist.github.com/tdunning/3aa841c56bfcdc0ab90e
  
   log-synth schema for generating input data is here:
   https://gist.github.com/tdunning/638dd52c00569ffa9582
  
  
   Preferred syntax would be like
  
   select flatten(t, v1, v2) from ...
  
  
  
  
   On Fri, May 29, 2015 at 7:04 AM, Neeraja Rentachintala 
   nrentachint...@maprtech.com wrote:
  
Ted
can you pls give an example with few data elements in a, b
 and
   the
   expected
output you are looking from the query.
   
-Neeraja
   
On Fri, May 29, 2015 at 6:43 AM, Ted Dunning 
ted.dunn...@gmail.com
wrote:
   
 I have two arrays.  Their elements are correlated times and
values.
  I
 would like to flatten them into rows, each with two
 elements.

 The query

select flatten(a), flatten(b) from ...

 doesn't work because I get the cartesian product (of
 course).
The
   query

select flatten(a, b) from ...

 also doesn't work because flatten doesn't have a
  multi-argument
 form.

 Going crazy, this query kind of sort of almost works, but
 not
 really:

  select r.x.`key`, flatten(r.x.`value`)  from (

  select flatten(kvgen(x)) as x from ...) r;

 What I really want to see is something like this:
select zip(flatten(a), flatten(b)) from ...

 Any pointers?  Is my next step to write a UDF?

   
  
 

   
   
   
--
 Steven Phillips
 Software Engineer
   
 mapr.com

Re: Review Request 34838: DRILL-3155: Part 1

2015-06-02 Thread Hanifi Gunes


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34838/#review86320
---

Ship it!


Ship It!

- Hanifi Gunes


On June 2, 2015, 8:14 p.m., Mehant Baid wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34838/
 ---
 
 (Updated June 2, 2015, 8:14 p.m.)
 
 
 Review request for drill and Hanifi Gunes.
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 This patch is a simple refactoring. Moved the classes related to complex 
 vectors in the appropriate package.
 
 
 Diffs
 -
 
   exec/java-exec/src/main/codegen/templates/RepeatedValueVectors.java 7b2b78d 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java
  00a78fd 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenTemplate.java
  b8d040c 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/Flattener.java
  323bf43 
   exec/java-exec/src/main/java/org/apache/drill/exec/store/VectorHolder.java 
 e602fd7 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/FixedWidthRepeatedReader.java
  2b929a4 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetRecordReader.java
  0cbd480 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/AllocationHelper.java
  eddefd0 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/BaseRepeatedValueVector.java
  d5a0d62 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/ContainerVectorLike.java
  95e3365 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/RepeatedFixedWidthVectorLike.java
  450c673 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/RepeatedMutator.java
  8e097e4 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/RepeatedValueVector.java
  95a7252 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/RepeatedVariableWidthVectorLike.java
  ac8589e 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/BaseRepeatedValueVector.java
  PRE-CREATION 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/ContainerVectorLike.java
  PRE-CREATION 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedFixedWidthVectorLike.java
  PRE-CREATION 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedListVector.java
  a5553b2 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMapVector.java
  a97847b 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedValueVector.java
  PRE-CREATION 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedVariableWidthVectorLike.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/34838/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Mehant Baid

Re: Review Request 34839: DRILL-3155: Part 2

2015-06-02 Thread Hanifi Gunes


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34839/#review86325
---

Ship it!


Ship It!

- Hanifi Gunes


On June 2, 2015, 9:49 p.m., Mehant Baid wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34839/
 ---
 
 (Updated June 2, 2015, 9:49 p.m.)
 
 
 Review request for drill and Hanifi Gunes.
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 While allocating memory for composite vectors if one of the allocation fails 
 we need to release all the allocated memory upto that point.
 
 
 Diffs
 -
 
   exec/java-exec/src/main/codegen/templates/NullableValueVectors.java 90ec6be 
   exec/java-exec/src/main/codegen/templates/RepeatedValueVectors.java 7b2b78d 
   exec/java-exec/src/main/codegen/templates/VariableLengthVectors.java 
 b3389e2 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/AbstractMapVector.java
  3c01939 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/BaseRepeatedValueVector.java
  PRE-CREATION 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMapVector.java
  a97847b 
 
 Diff: https://reviews.apache.org/r/34839/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Mehant Baid

Re: HbaseTestSuite failures

2015-06-02 Thread Hanifi GUNES

This typically indicates that you have another HBase instance running. Make
sure you have only one instance running, test run should complete then.

Unit tests should not use the same resources with the local HB instance. We
should fix this. Did you file a JIRA for this by any chance?

-Hanifi


2015-06-02 14:26 GMT-07:00 Abdel Hakim Deneche adene...@maprtech.com:

 did you try it on master ?

 On Tue, Jun 2, 2015 at 1:52 PM, Sudheesh Katkam skat...@maprtech.com
 wrote:

  Hi Drillers,
 
  When I run unit tests (mvn clean install), I am getting:
 
  Running org.apache.drill.hbase.HBaseTestsSuite
  Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 34.951
 sec
   FAILURE! - in org.apache.drill.hbase.HBaseTestsSuite
  org.apache.drill.hbase.HBaseTestsSuite  Time elapsed: 34.951 sec  
  ERROR!
  java.io.IOException: Shutting down
  at
 
 org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:190)
  at
 
 org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:425)
  at
  org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:224)
  at
  org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:93)
  at
 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:912)
  at
 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:881)
  at
 
 org.apache.drill.hbase.HBaseTestsSuite.initCluster(HBaseTestsSuite.java:88)
 
  Results :
 
  Tests in error:
HBaseTestsSuite.initCluster:88 » IO Shutting down
 
  Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
 
  Is anyone else seeing this?
 
  Thank you,
  Sudheesh




 --

 Abdelhakim Deneche

 Software Engineer

   http://www.mapr.com/


 Now Available - Free Hadoop On-Demand Training
 
 http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available

Re: known issue? Problem reading JSON

2015-06-01 Thread Hanifi Gunes

* The former query(select) does read and vectorize every single
field/column, thus field type matters whereas the latter(count) does not
really read at field level but simply counts individual JSON records
thereby very efficient in time (~90x in a single very wide record) and
memory.

On Mon, Jun 1, 2015 at 12:38 PM, Hanifi Gunes hgu...@maprtech.com wrote:

 The fact that count does not fail but select fails is known and will be
 there at least until we support heterogenous types. Also we handle these
 queries differently at JSON processor. The former query does read and
 vectorize every single field/column, thus field type matters whereas the
 latter does not really read at field level but simply counts individual
 JSON records thereby very efficient in time (~90x in a single very wide
 record) and memory. That's the reason why your count(*) query succeeds
 while select(*) fails.

 I agree that error messages need a touch. Filed DRILL-3231 to track this.


 On Sat, May 30, 2015 at 10:51 PM, Ted Dunning ted.dunn...@gmail.com
 wrote:

 OK.

 But this *is* in a data file that we distribute as part of Drill.

 Perhaps a better error message is warranted?

 Also, this seems to be a serious limitation that appears only to be
 fixable
 using a sledge-hammer.



 On Sun, May 31, 2015 at 3:31 AM, Jacques Nadeau jacq...@apache.org
 wrote:

  The second error is stating that you have a column that is a string in
 one
  row and a double in another.
 
  On Sat, May 30, 2015 at 3:16 PM, Ted Dunning ted.dunn...@gmail.com
  wrote:
 
   This seems wrong.  I can count the records in a JSON table, but
 select *
   doesn't work.
  
   Is this a known issue?
  
  
  
   ted:apache-drill-1.0.0$ bin/drill-embedded
   Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
   MaxPermSize=512M; support was removed in 8.0
   May 31, 2015 12:14:52 AM
 org.glassfish.jersey.server.ApplicationHandler
   initialize
   INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29
   01:25:26...
   apache drill 1.0.0
   got drill?
   0: jdbc:drill:zk=local *select count(*) from
   cp.`sales_fact_1997_collapsed.json` ;*
   +-+
   | EXPR$0  |
   +-+
   | 86837   |
   +-+
   1 row selected (1.316 seconds)
   0: jdbc:drill:zk=local *select * from
  cp.`sales_fact_1997_collapsed.json`
   limit 3;*
   Error: DATA_READ ERROR: Error parsing JSON - You tried to write a
 BigInt
   type when you are using a ValueWriter of type
 NullableFloat8WriterImpl.
  
   File  /sales_fact_1997_collapsed.json
   Record  3
   Fragment 0:0
  
   [Error Id: 8a9ac2c1-9764-42fd-bdeb-ec0b5e408438 on 192.168.1.38:31010
 ]
   (state=,code=0)
   0: jdbc:drill:zk=local *ALTER SYSTEM SET
   `store.json.read_numbers_as_double` = true;*
   +---+-+
   |  ok   |   summary   |
   +---+-+
   | true  | store.json.read_numbers_as_double updated.  |
   +---+-+
   1 row selected (0.086 seconds)
   0: jdbc:drill:zk=local *select * from
  cp.`sales_fact_1997_collapsed.json`
   limit 3;*
   Error: DATA_READ ERROR: Error parsing JSON - You tried to write a
 VarChar
   type when you are using a ValueWriter of type
 NullableFloat8WriterImpl.
  
   File  /sales_fact_1997_collapsed.json
   Record  47
   Fragment 0:0

Re: known issue? Problem reading JSON

2015-06-01 Thread Hanifi Gunes

The fact that count does not fail but select fails is known and will be
there at least until we support heterogenous types. Also we handle these
queries differently at JSON processor. The former query does read and
vectorize every single field/column, thus field type matters whereas the
latter does not really read at field level but simply counts individual
JSON records thereby very efficient in time (~90x in a single very wide
record) and memory. That's the reason why your count(*) query succeeds
while select(*) fails.

I agree that error messages need a touch. Filed DRILL-3231 to track this.


On Sat, May 30, 2015 at 10:51 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 OK.

 But this *is* in a data file that we distribute as part of Drill.

 Perhaps a better error message is warranted?

 Also, this seems to be a serious limitation that appears only to be fixable
 using a sledge-hammer.



 On Sun, May 31, 2015 at 3:31 AM, Jacques Nadeau jacq...@apache.org
 wrote:

  The second error is stating that you have a column that is a string in
 one
  row and a double in another.
 
  On Sat, May 30, 2015 at 3:16 PM, Ted Dunning ted.dunn...@gmail.com
  wrote:
 
   This seems wrong.  I can count the records in a JSON table, but select
 *
   doesn't work.
  
   Is this a known issue?
  
  
  
   ted:apache-drill-1.0.0$ bin/drill-embedded
   Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
   MaxPermSize=512M; support was removed in 8.0
   May 31, 2015 12:14:52 AM org.glassfish.jersey.server.ApplicationHandler
   initialize
   INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29
   01:25:26...
   apache drill 1.0.0
   got drill?
   0: jdbc:drill:zk=local *select count(*) from
   cp.`sales_fact_1997_collapsed.json` ;*
   +-+
   | EXPR$0  |
   +-+
   | 86837   |
   +-+
   1 row selected (1.316 seconds)
   0: jdbc:drill:zk=local *select * from
  cp.`sales_fact_1997_collapsed.json`
   limit 3;*
   Error: DATA_READ ERROR: Error parsing JSON - You tried to write a
 BigInt
   type when you are using a ValueWriter of type NullableFloat8WriterImpl.
  
   File  /sales_fact_1997_collapsed.json
   Record  3
   Fragment 0:0
  
   [Error Id: 8a9ac2c1-9764-42fd-bdeb-ec0b5e408438 on 192.168.1.38:31010]
   (state=,code=0)
   0: jdbc:drill:zk=local *ALTER SYSTEM SET
   `store.json.read_numbers_as_double` = true;*
   +---+-+
   |  ok   |   summary   |
   +---+-+
   | true  | store.json.read_numbers_as_double updated.  |
   +---+-+
   1 row selected (0.086 seconds)
   0: jdbc:drill:zk=local *select * from
  cp.`sales_fact_1997_collapsed.json`
   limit 3;*
   Error: DATA_READ ERROR: Error parsing JSON - You tried to write a
 VarChar
   type when you are using a ValueWriter of type NullableFloat8WriterImpl.
  
   File  /sales_fact_1997_collapsed.json
   Record  47
   Fragment 0:0

Re: question about correlated arrays and flatten

2015-06-01 Thread Hanifi Gunes

Idea of having functional primitives with Drill sounds really handy. It
would be great if we could support left-right folding as well. I can see
many great use cases of project/map, fold/reduce, zip, flatten when
combined.

On Sat, May 30, 2015 at 12:57 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 OK.  I will file a JIRA for a zip function.  No idea if I will be able to
 get one written in the available cracks of time.



 On Fri, May 29, 2015 at 7:17 PM, Steven Phillips sphill...@maprtech.com
 wrote:

  I think your use case could be solved by adding a UDF that can combine
  multiple arrays into a single array. The result of this function could
 then
  be handled by our current implementation of flatten.
 
  I think this is preferable to enhancing flatten itself to handle it,
 since
  flatten is not an ordinary UDF, and thus more difficult to modify and
  maintain.
 
  On Fri, May 29, 2015 at 3:20 PM, Ted Dunning ted.dunn...@gmail.com
  wrote:
 
   My particular use case can throw an error if the lists are different
   length.
  
   I think our real goal should be to have a logically complete set of
  simple
   primitives that lets any sort of back and forward conversions of this
  kind.
  
  
  
  
   On Fri, May 29, 2015 at 9:58 AM, Jason Altekruse 
  altekruseja...@gmail.com
   
   wrote:
  
I understand what you want to do, unfortunately we don't have support
  for
this right now. A UDF is the best I can suggest at this point.
   
Just to explore the idea a little further for the sake of creating a
complete feature request, I assume you would just want nulls filled
 in
   for
the cases where the lists were different lengths?
   
On Fri, May 29, 2015 at 8:58 AM, Ted Dunning ted.dunn...@gmail.com
wrote:
   
 Input is here:
 https://gist.github.com/tdunning/07ce66e7e4d4af41afd7

 Output is here:
  https://gist.github.com/tdunning/3aa841c56bfcdc0ab90e

 log-synth schema for generating input data is here:
 https://gist.github.com/tdunning/638dd52c00569ffa9582


 Preferred syntax would be like

 select flatten(t, v1, v2) from ...




 On Fri, May 29, 2015 at 7:04 AM, Neeraja Rentachintala 
 nrentachint...@maprtech.com wrote:

  Ted
  can you pls give an example with few data elements in a, b and
 the
 expected
  output you are looking from the query.
 
  -Neeraja
 
  On Fri, May 29, 2015 at 6:43 AM, Ted Dunning 
  ted.dunn...@gmail.com
  wrote:
 
   I have two arrays.  Their elements are correlated times and
  values.
I
   would like to flatten them into rows, each with two elements.
  
   The query
  
  select flatten(a), flatten(b) from ...
  
   doesn't work because I get the cartesian product (of course).
  The
 query
  
  select flatten(a, b) from ...
  
   also doesn't work because flatten doesn't have a multi-argument
   form.
  
   Going crazy, this query kind of sort of almost works, but not
   really:
  
select r.x.`key`, flatten(r.x.`value`)  from (
  
select flatten(kvgen(x)) as x from ...) r;
  
   What I really want to see is something like this:
  select zip(flatten(a), flatten(b)) from ...
  
   Any pointers?  Is my next step to write a UDF?
  
 

   
  
 
 
 
  --
   Steven Phillips
   Software Engineer
 
   mapr.com

1 2 3 >

1 - 100 of 222 matches

Mail list logo