[GitHub] drill pull request: DRILL-3854: For convert_from, re-point readerI...

2015-11-20 Thread jacques-n
Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/262#issuecomment-158462636
  
lgtm.

Do you want to add a unit test that verifies the buffer reader index == 
zero behavior in addition to the integration-ish test you added? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Naming the new ValueVector Initiative

2015-11-20 Thread Jacques Nadeau
Ok, it looks like we have a candidate list (we actually got 11 since there
was a three-way tie for ninth place):

VectorArrowhoneycombHerringbonejoistV2Pietcolbufbatonimpulsevictor
Next we need to do trademark searches on each of these to see whether we're
likely to have success. I've moved candidates to a second tab:

https://docs.google.com/spreadsheets/d/1q6UqluW6SLuMKRwW2TBGBzHfYLlXYm37eKJlIxWQGQM/edit#gid=304381532

Anybody want to give a hand in analyzing potential conflicts?



--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Mon, Nov 16, 2015 at 12:10 PM, Jacques Nadeau  wrote:

> Everybody should pick their ten favorites using the numbers 1 to 10.
>
> 10 is most preferred
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Mon, Nov 16, 2015 at 10:17 AM, Ted Dunning 
> wrote:
>
>>
>> Single vote for most preferred?
>>
>> Single transferable vote?
>>
>>
>>
>> On Tue, Nov 17, 2015 at 2:50 AM, Jacques Nadeau 
>> wrote:
>>
>>> Given that a bunch of people added names to the sheet, I'll take that as
>>> tacit agreement to the proposed process.
>>>
>>> Let's move to the first vote phase. I've added a column for everybody's
>>> votes. Let's try to wrap up the vote by 10am on Wednesday.
>>>
>>> thanks!
>>> Jacques
>>>
>>>
>>>
>>> --
>>> Jacques Nadeau
>>> CTO and Co-Founder, Dremio
>>>
>>> On Thu, Nov 12, 2015 at 12:03 PM, Jacques Nadeau 
>>> wrote:
>>>
 Hey Guys,

 It sounds like we need to do a little more work on the Vector proposal
 before the board would like to consider it. The main point of contention
 right now is the name of the project. We need to decide on a name and
 get
 it signed off through PODLINGNAMESEARCH.

 Naming is extremely subjective so I'd like to propose a process for
 selection that minimizes pain. This is an initial proposal and

 We do the naming in the following steps
 - 1: Collect a set of names to be considered
 - 2: Run a vote for 2 days where each member ranks their top 10 options
 1..10
 - 3: Take the top ten vote getters and do a basic analysis of whether we
 think that any have legal issues. Keep dropping names that have this
 until
 we get with 10 reasonably solid candidate names
 - 5: Take the top ten names and give people 48 hours to rank their top 3
 names
 - 6: Start a PODLINGNAMESEARCH on the top rank one, if that doesn't
 work,
 try the second and third options.

 I suggest we take name suggestions for step 1 from everyone but then
 constrain the voting to the newly proposed project [1]. We could just do
 this in a private email thread but I think doing it on Drill dev is
 better
 in the interest of transparency. This isn't the perfect place for that
 but
 I'm not sure a better place exists.

 I'm up for changing any or all of this depending on what others think.
 Just
 wanted to get the ball rolling on a proposed process.

 If this works, I've posted a doc at [2] that we can use for step 1.

 Thanks,
 Jacques

 [1] List of proposed new project members/voters: Todd Lipcon, Ted
 Dunning,
 Michael Stack, P. Taylor Goetz, Julian Hyde, Julien Le Dem, Jacques
 Nadeau,
 James Taylor, Jake Luciani, Parth Chandra, Alex Levenson, Marcel
 Kornacker,
 Steven Phillips, Hanifi Gunes, Wes McKinney, Jason Altekruse, David
 Alves,
 Zain Asgar, Ippokratis Pandis, Abdel Hakim Deneche, Reynold Xin.
 [2]

 https://docs.google.com/spreadsheets/d/1q6UqluW6SLuMKRwW2TBGBzHfYLlXYm37eKJlIxWQGQM/edit#gid=0

>>>
>>>
>>
>


[jira] [Created] (DRILL-4116) Inconsistent results with datetime functions on different machines

2015-11-20 Thread Rahul Challapalli (JIRA)
Rahul Challapalli created DRILL-4116:


 Summary: Inconsistent results with datetime functions on different 
machines
 Key: DRILL-4116
 URL: https://issues.apache.org/jira/browse/DRILL-4116
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.3.0
Reporter: Rahul Challapalli
Priority: Critical


git.commit.id.abbrev=a6a0fc3

The below query yields different results on different machines
System 1 :
{code}
0: jdbc:drill:zk=10.10.100.190:5181> select datediff(date '1996-03-01', 
timestamp '1997-02-10 17:32:00.0') from cp.`tpch/lineitem.parquet` limit 1;
+-+
| EXPR$0  |
+-+
| -346|
+-+
1 row selected (1.57 seconds)
{code}

System 2 :
{code}
0: jdbc:drill:drillbit=10.10.88.193> select datediff(date '1996-03-01', 
timestamp '1997-02-10 17:32:00.0') from cp.`tpch/lineitem.parquet` limit 1;
+-+
| EXPR$0  |
+-+
| -347|
+-+
1 row selected (1.239 seconds)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


ExternalSort doesn't properly account for sliced buffers

2015-11-20 Thread Abdel Hakim Deneche
I'm looking at the external sort code and it uses the following method to
compute the allocated size of a batch:

  private long getBufferSize(VectorAccessible batch) {
> long size = 0;
> for (VectorWrapper w : batch) {
>   DrillBuf[] bufs = w.getValueVector().getBuffers(false);
>   for (DrillBuf buf : bufs) {
> if (*buf.isRootBuffer()*) {
>   size += buf.capacity();
> }
>   }
> }
> return size;
>   }


This method only accounts for root buffers, but when we have a receiver
below the sort, most of (if not all) buffers are child buffers. This may
delay spilling, and increase the memory usage of the drillbit. If my
computations are correct, for a single query, one drillbit can allocate up
to 40GB without spilling once to disk.

Is there a specific reason we only account for root buffers ?

-- 

Abdelhakim Deneche

Software Engineer

  


Now Available - Free Hadoop On-Demand Training



[GitHub] drill pull request: DRILL-2618: handle queries over empty folders ...

2015-11-20 Thread hnfgns
Github user hnfgns commented on a diff in the pull request:

https://github.com/apache/drill/pull/270#discussion_r45509104
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java 
---
@@ -190,22 +136,40 @@ private static String commonPath(FileStatus... paths) 
{
 break;
   }
 }
-URI oneURI = paths[0].getPath().toUri();
+URI oneURI = statuses.get(0).getPath().toUri();
 return new Path(oneURI.getScheme(), oneURI.getAuthority(), 
commonPath).toString();
   }
 
-  public static FileSelection create(DrillFileSystem fs, String parent, 
String path) throws IOException {
-Path p = new Path(parent,removeLeadingSlash(path));
-FileStatus[] status = fs.globStatus(p);
-if (status == null || status.length == 0) {
+  public static FileSelection create(final DrillFileSystem fs, final 
String parent, final String path) throws IOException {
+final Path combined = new Path(parent, removeLeadingSlash(path));
+final FileStatus[] statuses = fs.globStatus(combined);
+if (statuses == null) {
   return null;
 }
-if (status.length == 1) {
-  URI oneURI = status[0].getPath().toUri();
-  String selectionRoot = new Path(oneURI.getScheme(), 
oneURI.getAuthority(), p.toUri().getPath()).toString();
-  return new FileSelection(Collections.singletonList(status[0]), 
selectionRoot);
+return create(Lists.newArrayList(statuses), null, 
combined.toUri().toString());
+  }
+
+  /**
+   * Creates a {@link FileSelection selection} with the given file 
statuses/files and selection root.
+   *
+   * @param statuses  list of file statuses
+   * @param files  list of files
+   * @param root  root path for selections
+   *
+   * @return  null if creation of {@link FileSelection} fails with an 
{@link IllegalArgumentException}
+   *  otherwise a new selection.
+   *
+   * @see FileSelection#FileSelection(List, List, String)
+   */
+  public static FileSelection create(final List statuses, 
final List files, final String root) {
+try {
+  final String selectionRoot = Strings.isNullOrEmpty(root) ? 
commonPath(statuses) : root;
--- End diff --

thanks. done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-2618: handle queries over empty folders ...

2015-11-20 Thread hnfgns
Github user hnfgns commented on a diff in the pull request:

https://github.com/apache/drill/pull/270#discussion_r45509247
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java 
---
@@ -190,22 +136,40 @@ private static String commonPath(FileStatus... paths) 
{
 break;
   }
 }
-URI oneURI = paths[0].getPath().toUri();
+URI oneURI = statuses.get(0).getPath().toUri();
 return new Path(oneURI.getScheme(), oneURI.getAuthority(), 
commonPath).toString();
   }
 
-  public static FileSelection create(DrillFileSystem fs, String parent, 
String path) throws IOException {
-Path p = new Path(parent,removeLeadingSlash(path));
-FileStatus[] status = fs.globStatus(p);
-if (status == null || status.length == 0) {
+  public static FileSelection create(final DrillFileSystem fs, final 
String parent, final String path) throws IOException {
+final Path combined = new Path(parent, removeLeadingSlash(path));
+final FileStatus[] statuses = fs.globStatus(combined);
+if (statuses == null) {
   return null;
 }
-if (status.length == 1) {
-  URI oneURI = status[0].getPath().toUri();
-  String selectionRoot = new Path(oneURI.getScheme(), 
oneURI.getAuthority(), p.toUri().getPath()).toString();
-  return new FileSelection(Collections.singletonList(status[0]), 
selectionRoot);
+return create(Lists.newArrayList(statuses), null, 
combined.toUri().toString());
+  }
+
+  /**
+   * Creates a {@link FileSelection selection} with the given file 
statuses/files and selection root.
+   *
+   * @param statuses  list of file statuses
+   * @param files  list of files
+   * @param root  root path for selections
+   *
+   * @return  null if creation of {@link FileSelection} fails with an 
{@link IllegalArgumentException}
+   *  otherwise a new selection.
+   *
+   * @see FileSelection#FileSelection(List, List, String)
+   */
+  public static FileSelection create(final List statuses, 
final List files, final String root) {
+try {
+  final String selectionRoot = Strings.isNullOrEmpty(root) ? 
commonPath(statuses) : root;
+  return new FileSelection(statuses, files, selectionRoot);
+} catch (IllegalArgumentException ex) {
+  // we defer task of validating arguments for code consistency and 
inheritance support.
+  logger.debug("unable to create FileSelection", ex);
+  return null;
--- End diff --

I will create a separate JIRA for investigating all cases of null outcome 
handling.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: ExternalSort doesn't properly account for sliced buffers

2015-11-20 Thread Steven Phillips
I think it is because we can't actually properly account for sliced
buffers. I don't remember for sure, but I think it might be because calling
buf.capacity() on a sliced buffer returns the the capacity of root buffer,
not the size of the slice. That may not be correct, but I think it was
something like that. Whatever it is, I am pretty sure it was giving wrong
results when they are sliced buffers.

I think we need to get the new allocator, along with proper transfer of
ownership in order to do this correctly. Then we can just query the
allocator rather than trying to track it separately.

On Fri, Nov 20, 2015 at 11:25 AM, Abdel Hakim Deneche  wrote:

> I'm looking at the external sort code and it uses the following method to
> compute the allocated size of a batch:
>
>   private long getBufferSize(VectorAccessible batch) {
> > long size = 0;
> > for (VectorWrapper w : batch) {
> >   DrillBuf[] bufs = w.getValueVector().getBuffers(false);
> >   for (DrillBuf buf : bufs) {
> > if (*buf.isRootBuffer()*) {
> >   size += buf.capacity();
> > }
> >   }
> > }
> > return size;
> >   }
>
>
> This method only accounts for root buffers, but when we have a receiver
> below the sort, most of (if not all) buffers are child buffers. This may
> delay spilling, and increase the memory usage of the drillbit. If my
> computations are correct, for a single query, one drillbit can allocate up
> to 40GB without spilling once to disk.
>
> Is there a specific reason we only account for root buffers ?
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>


[GitHub] drill pull request: DRILL-2618: handle queries over empty folders ...

2015-11-20 Thread hnfgns
Github user hnfgns commented on a diff in the pull request:

https://github.com/apache/drill/pull/270#discussion_r45509395
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFileSelection.java
 ---
@@ -0,0 +1,62 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet;
+
+import com.google.common.base.Preconditions;
+import org.apache.drill.exec.store.dfs.FileSelection;
+import 
org.apache.drill.exec.store.parquet.Metadata.ParquetTableMetadata_v1;
+
+/**
+ * Parquet specific {@link FileSelection selection} that carries out 
{@link ParquetTableMetadata_v1 metadata} along.
+ */
+public class ParquetFileSelection extends FileSelection {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ParquetFileSelection.class);
--- End diff --

sure. done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4117) Ensure proper null outcome handling during FileSelection creation and its subclasses.

2015-11-20 Thread Hanifi Gunes (JIRA)
Hanifi Gunes created DRILL-4117:
---

 Summary: Ensure proper null outcome handling during FileSelection 
creation and its subclasses.
 Key: DRILL-4117
 URL: https://issues.apache.org/jira/browse/DRILL-4117
 Project: Apache Drill
  Issue Type: Bug
Reporter: Hanifi Gunes


Hakim identified the following does not make a null check upon the result of  
FileSelection.create(...). This issue is to ensure proper null outcome handling 
during FileSelection creation and its subclasses or to return a non-null 
default type.

{quote}
onFileSystemPartitionDescriptor.createNewGroupScan() passes the output to 
FileGroupScan.close() which expects it to be not null
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Release Apache Drill 1.3.0 (rc3)

2015-11-20 Thread Aman Sinha
+1  (binding)

-Downloaded source and built on Mac
-Ran unit tests
-Ran manual tests against parquet partitioned data with and without
metadata cache
-Examined Explain plans for a few queries with partition filters on BigInt
columns, verified partition pruning was working
-Examined query profiles in Web UI




On Thu, Nov 19, 2015 at 9:36 PM, Sudheesh Katkam 
wrote:

> +1 (non-binding)
>
> * ran queries (including cancellations) in embedded mode using the binary
> tarball on Ubuntu; verified states in web UI
> * built (including tests) from the source tarball on Ubuntu
> * verified checksums on the three tarball artifacts
>
> One minor nit:
> > select * from sys.version;
>
> +--++-+--+--+-+
> | version  | commit_id  | commit_message  | commit_time  | build_email  |
> build_time  |
>
> +--++-+--+--+-+
> | 1.3.0| Unknown| |  | Unknown  |
>|
>
> +--++-+--+--+-+
>
> However, the git.properties files has the right information.
>
> Also, it would be nice to get rid of this message when starting up in
> embedded mode. It does not happen in distributed mode.
> Nov 20, 2015 5:10:39 AM org.glassfish.jersey.server.ApplicationHandler
> initialize
> INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29
> 01:25:26..
>
> Thanks,
> Sudheesh
>
> > On Nov 19, 2015, at 2:25 PM, Julien Le Dem  wrote:
> >
> > +1 (non binding)
> > built from source and ran the tests on linux
> >
> > On Thu, Nov 19, 2015 at 2:22 PM, Hsuan Yi Chu 
> wrote:
> >
> >> (+1 no binding)
> >> Unit test on Mac
> >>
> >> Queries on Distributed mode. Things look fairly good.
> >>
> >>
> >> On Thursday, November 19, 2015, Khurram Faraaz 
> >> wrote:
> >>
> >>> (+1 non-binding)
> >>>
> >>> Built from source and ran unit tests on CentOS.
> >>>
> >>> On Thu, Nov 19, 2015 at 1:34 PM, Norris Lee  >>> > wrote:
> >>>
>  +1 (non-binding)
> 
>  Built from source on Linux.  Tested with ODBC driver against multiple
> >>> data
>  formats (Hive, JSON, CSV, Parquet, TSV, Avro).
> 
>  Norris
> 
>  -Original Message-
>  From: Jacques Nadeau [mailto:jacq...@dremio.com ]
>  Sent: Tuesday, November 17, 2015 10:27 PM
>  To: dev >
>  Subject: [VOTE] Release Apache Drill 1.3.0 (rc3)
> 
>  Hey Everybody,
> 
>  I'd like to propose a new release candidate of Apache Drill, version
>  1.3.0.  This is the fourth release candidate (rc3).  This addresses
> >> some
>  issues identified in the the third release candidate including some
> >>> issues
>  with missing S3 dependencies, Avro deserialization issues and Parquet
> >>> file
>  writer metadata additions. Note that this doesn't directly address
>  DRILL-4070, an issue that sunk the last candidate. Based on additional
>  conversations on the JIRA, the plan is provide a separate migration
> >> tool
>  for people to rewrite their Parquet footers.
> 
>  The tarball artifacts are hosted at [2] and the maven artifacts are
> >>> hosted
>  at [3]. This release candidate is based on commit
>  cc127ff4ac6272d2cb1b602890c0b7c503ea2062 located at [4].
> 
>  The vote will be open for 72 hours ending at 10PM Pacific, November
> 20,
>  2015.
> 
>  [ ] +1
>  [ ] +0
>  [ ] -1
> 
>  thanks,
>  Jacques
> 
>  [1]
> 
> 
> >>>
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12332946
>  [2]http://people.apache.org/~jacques/apache-drill-1.3.0.rc3/
>  [3]
> 
> >> https://repository.apache.org/content/repositories/orgapachedrill-1016/
>  [4] https://github.com/jacques-n/drill/tree/drill-1.3.0
> 
> >>>
> >>
> >
> >
> >
> > --
> > Julien
>
>


[GitHub] drill pull request: DRILL-2915: After cartesian join is selected, ...

2015-11-20 Thread hsuanyi
GitHub user hsuanyi opened a pull request:

https://github.com/apache/drill/pull/271

DRILL-2915: After cartesian join is selected, Drill will apply the ne…

…w rule to try to merge multiple joins more complete.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hsuanyi/incubator-drill DRILL-2915

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/271.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #271


commit 34a77a6c210f26cf70e555da880a66d1c3713919
Author: Hsuan-Yi Chu 
Date:   2015-11-20T06:42:14Z

DRILL-2915: After cartesian join is selected, Drill will apply the new rule 
to try to merge multiple joins more complete.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-2915: After cartesian join is selected, ...

2015-11-20 Thread jaltekruse
Github user jaltekruse commented on a diff in the pull request:

https://github.com/apache/drill/pull/271#discussion_r45524312
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java ---
@@ -33,6 +33,30 @@
 public class TestExampleQueries extends BaseTestQuery {
 //  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestExampleQueries.class);
 
+  @Test // DRILL-2915
--- End diff --

If this is fixing join ordering, shouldn't this be doing plan verification 
in addition to checking the results?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: ExternalSort doesn't properly account for sliced buffers

2015-11-20 Thread Hanifi GUNES
Problem with the above code is that not all vectors operate on root buffers
rendering the accounting above inaccurate. In fact your example is one
perfect instance where vectors would point to non-root buffers for sure
because of the slicing taking place at RecordBatchLoader#load [1]


1:
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchLoader.java#L117

2015-11-20 11:41 GMT-08:00 Steven Phillips :

> I think it is because we can't actually properly account for sliced
> buffers. I don't remember for sure, but I think it might be because calling
> buf.capacity() on a sliced buffer returns the the capacity of root buffer,
> not the size of the slice. That may not be correct, but I think it was
> something like that. Whatever it is, I am pretty sure it was giving wrong
> results when they are sliced buffers.
>
> I think we need to get the new allocator, along with proper transfer of
> ownership in order to do this correctly. Then we can just query the
> allocator rather than trying to track it separately.
>
> On Fri, Nov 20, 2015 at 11:25 AM, Abdel Hakim Deneche <
> adene...@maprtech.com
> > wrote:
>
> > I'm looking at the external sort code and it uses the following method to
> > compute the allocated size of a batch:
> >
> >   private long getBufferSize(VectorAccessible batch) {
> > > long size = 0;
> > > for (VectorWrapper w : batch) {
> > >   DrillBuf[] bufs = w.getValueVector().getBuffers(false);
> > >   for (DrillBuf buf : bufs) {
> > > if (*buf.isRootBuffer()*) {
> > >   size += buf.capacity();
> > > }
> > >   }
> > > }
> > > return size;
> > >   }
> >
> >
> > This method only accounts for root buffers, but when we have a receiver
> > below the sort, most of (if not all) buffers are child buffers. This may
> > delay spilling, and increase the memory usage of the drillbit. If my
> > computations are correct, for a single query, one drillbit can allocate
> up
> > to 40GB without spilling once to disk.
> >
> > Is there a specific reason we only account for root buffers ?
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>


[GitHub] drill pull request: DRILL-3854: For convert_from, re-point readerI...

2015-11-20 Thread hsuanyi
Github user hsuanyi commented on the pull request:

https://github.com/apache/drill/pull/262#issuecomment-158554693
  
Add a unit test to ensure that, in DrillBuf, readIndex is reset (pointing 
at 0) for both clear and !clear cases.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: ExternalSort doesn't properly account for sliced buffers

2015-11-20 Thread Abdel Hakim Deneche
@Steven, I think DrillBuf.capacity() was changed at some point, I was
looking at the code and it seems to only return the size of the "slice" and
not the root buffer.

While waiting for the new allocator and transfer of ownership, would it
make sense to remove the check for root buffers like this ?

  private long getBufferSize(VectorAccessible batch) {
> long size = 0;
> for (VectorWrapper w : batch) {
>   DrillBuf[] bufs = w.getValueVector().getBuffers(false);
>   for (DrillBuf buf : bufs) {
> size += buf.capacity();
>   }
> }
> return size;
>   }



On Fri, Nov 20, 2015 at 2:50 PM, Hanifi GUNES  wrote:

> Problem with the above code is that not all vectors operate on root buffers
> rendering the accounting above inaccurate. In fact your example is one
> perfect instance where vectors would point to non-root buffers for sure
> because of the slicing taking place at RecordBatchLoader#load [1]
>
>
> 1:
>
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchLoader.java#L117
>
> 2015-11-20 11:41 GMT-08:00 Steven Phillips :
>
> > I think it is because we can't actually properly account for sliced
> > buffers. I don't remember for sure, but I think it might be because
> calling
> > buf.capacity() on a sliced buffer returns the the capacity of root
> buffer,
> > not the size of the slice. That may not be correct, but I think it was
> > something like that. Whatever it is, I am pretty sure it was giving wrong
> > results when they are sliced buffers.
> >
> > I think we need to get the new allocator, along with proper transfer of
> > ownership in order to do this correctly. Then we can just query the
> > allocator rather than trying to track it separately.
> >
> > On Fri, Nov 20, 2015 at 11:25 AM, Abdel Hakim Deneche <
> > adene...@maprtech.com
> > > wrote:
> >
> > > I'm looking at the external sort code and it uses the following method
> to
> > > compute the allocated size of a batch:
> > >
> > >   private long getBufferSize(VectorAccessible batch) {
> > > > long size = 0;
> > > > for (VectorWrapper w : batch) {
> > > >   DrillBuf[] bufs = w.getValueVector().getBuffers(false);
> > > >   for (DrillBuf buf : bufs) {
> > > > if (*buf.isRootBuffer()*) {
> > > >   size += buf.capacity();
> > > > }
> > > >   }
> > > > }
> > > > return size;
> > > >   }
> > >
> > >
> > > This method only accounts for root buffers, but when we have a receiver
> > > below the sort, most of (if not all) buffers are child buffers. This
> may
> > > delay spilling, and increase the memory usage of the drillbit. If my
> > > computations are correct, for a single query, one drillbit can allocate
> > up
> > > to 40GB without spilling once to disk.
> > >
> > > Is there a specific reason we only account for root buffers ?
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  


Now Available - Free Hadoop On-Demand Training



Re: ExternalSort doesn't properly account for sliced buffers

2015-11-20 Thread Jacques Nadeau
My gut is that any changes other than correct accounting will have little
impact on real world situations. Do you think the change will make enough
difference to be valuable?

It seems like capacity should return the length of the slice (since I
believe that fits with the general ByteBuf interface). If I reset
writerIndex(0), I should be able write to capacity() without issue. Seems
weird to return some other (mostly disconnect) value.



--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Fri, Nov 20, 2015 at 3:28 PM, Abdel Hakim Deneche 
wrote:

> @Steven, I think DrillBuf.capacity() was changed at some point, I was
> looking at the code and it seems to only return the size of the "slice" and
> not the root buffer.
>
> While waiting for the new allocator and transfer of ownership, would it
> make sense to remove the check for root buffers like this ?
>
>   private long getBufferSize(VectorAccessible batch) {
> > long size = 0;
> > for (VectorWrapper w : batch) {
> >   DrillBuf[] bufs = w.getValueVector().getBuffers(false);
> >   for (DrillBuf buf : bufs) {
> > size += buf.capacity();
> >   }
> > }
> > return size;
> >   }
>
>
>
> On Fri, Nov 20, 2015 at 2:50 PM, Hanifi GUNES 
> wrote:
>
> > Problem with the above code is that not all vectors operate on root
> buffers
> > rendering the accounting above inaccurate. In fact your example is one
> > perfect instance where vectors would point to non-root buffers for sure
> > because of the slicing taking place at RecordBatchLoader#load [1]
> >
> >
> > 1:
> >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchLoader.java#L117
> >
> > 2015-11-20 11:41 GMT-08:00 Steven Phillips :
> >
> > > I think it is because we can't actually properly account for sliced
> > > buffers. I don't remember for sure, but I think it might be because
> > calling
> > > buf.capacity() on a sliced buffer returns the the capacity of root
> > buffer,
> > > not the size of the slice. That may not be correct, but I think it was
> > > something like that. Whatever it is, I am pretty sure it was giving
> wrong
> > > results when they are sliced buffers.
> > >
> > > I think we need to get the new allocator, along with proper transfer of
> > > ownership in order to do this correctly. Then we can just query the
> > > allocator rather than trying to track it separately.
> > >
> > > On Fri, Nov 20, 2015 at 11:25 AM, Abdel Hakim Deneche <
> > > adene...@maprtech.com
> > > > wrote:
> > >
> > > > I'm looking at the external sort code and it uses the following
> method
> > to
> > > > compute the allocated size of a batch:
> > > >
> > > >   private long getBufferSize(VectorAccessible batch) {
> > > > > long size = 0;
> > > > > for (VectorWrapper w : batch) {
> > > > >   DrillBuf[] bufs = w.getValueVector().getBuffers(false);
> > > > >   for (DrillBuf buf : bufs) {
> > > > > if (*buf.isRootBuffer()*) {
> > > > >   size += buf.capacity();
> > > > > }
> > > > >   }
> > > > > }
> > > > > return size;
> > > > >   }
> > > >
> > > >
> > > > This method only accounts for root buffers, but when we have a
> receiver
> > > > below the sort, most of (if not all) buffers are child buffers. This
> > may
> > > > delay spilling, and increase the memory usage of the drillbit. If my
> > > > computations are correct, for a single query, one drillbit can
> allocate
> > > up
> > > > to 40GB without spilling once to disk.
> > > >
> > > > Is there a specific reason we only account for root buffers ?
> > > >
> > > > --
> > > >
> > > > Abdelhakim Deneche
> > > >
> > > > Software Engineer
> > > >
> > > >   
> > > >
> > > >
> > > > Now Available - Free Hadoop On-Demand Training
> > > > <
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > >
> > > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>


Re: ExternalSort doesn't properly account for sliced buffers

2015-11-20 Thread Hanifi GUNES
Seems like I am missing some input here. Are we talking about changing the
behavior of DrillBuf#capacity()? If so, I would +1 the following:

*- It seems like capacity should return the length of the slice (since
I believe that fits with the general ByteBuf interface). If I
reset writerIndex(0), I should be able write to capacity() without issue.*

2015-11-20 16:03 GMT-08:00 Jacques Nadeau :

> My gut is that any changes other than correct accounting will have little
> impact on real world situations. Do you think the change will make enough
> difference to be valuable?
>
> It seems like capacity should return the length of the slice (since I
> believe that fits with the general ByteBuf interface). If I reset
> writerIndex(0), I should be able write to capacity() without issue. Seems
> weird to return some other (mostly disconnect) value.
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Fri, Nov 20, 2015 at 3:28 PM, Abdel Hakim Deneche <
> adene...@maprtech.com>
> wrote:
>
> > @Steven, I think DrillBuf.capacity() was changed at some point, I was
> > looking at the code and it seems to only return the size of the "slice"
> and
> > not the root buffer.
> >
> > While waiting for the new allocator and transfer of ownership, would it
> > make sense to remove the check for root buffers like this ?
> >
> >   private long getBufferSize(VectorAccessible batch) {
> > > long size = 0;
> > > for (VectorWrapper w : batch) {
> > >   DrillBuf[] bufs = w.getValueVector().getBuffers(false);
> > >   for (DrillBuf buf : bufs) {
> > > size += buf.capacity();
> > >   }
> > > }
> > > return size;
> > >   }
> >
> >
> >
> > On Fri, Nov 20, 2015 at 2:50 PM, Hanifi GUNES 
> > wrote:
> >
> > > Problem with the above code is that not all vectors operate on root
> > buffers
> > > rendering the accounting above inaccurate. In fact your example is one
> > > perfect instance where vectors would point to non-root buffers for sure
> > > because of the slicing taking place at RecordBatchLoader#load [1]
> > >
> > >
> > > 1:
> > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchLoader.java#L117
> > >
> > > 2015-11-20 11:41 GMT-08:00 Steven Phillips :
> > >
> > > > I think it is because we can't actually properly account for sliced
> > > > buffers. I don't remember for sure, but I think it might be because
> > > calling
> > > > buf.capacity() on a sliced buffer returns the the capacity of root
> > > buffer,
> > > > not the size of the slice. That may not be correct, but I think it
> was
> > > > something like that. Whatever it is, I am pretty sure it was giving
> > wrong
> > > > results when they are sliced buffers.
> > > >
> > > > I think we need to get the new allocator, along with proper transfer
> of
> > > > ownership in order to do this correctly. Then we can just query the
> > > > allocator rather than trying to track it separately.
> > > >
> > > > On Fri, Nov 20, 2015 at 11:25 AM, Abdel Hakim Deneche <
> > > > adene...@maprtech.com
> > > > > wrote:
> > > >
> > > > > I'm looking at the external sort code and it uses the following
> > method
> > > to
> > > > > compute the allocated size of a batch:
> > > > >
> > > > >   private long getBufferSize(VectorAccessible batch) {
> > > > > > long size = 0;
> > > > > > for (VectorWrapper w : batch) {
> > > > > >   DrillBuf[] bufs = w.getValueVector().getBuffers(false);
> > > > > >   for (DrillBuf buf : bufs) {
> > > > > > if (*buf.isRootBuffer()*) {
> > > > > >   size += buf.capacity();
> > > > > > }
> > > > > >   }
> > > > > > }
> > > > > > return size;
> > > > > >   }
> > > > >
> > > > >
> > > > > This method only accounts for root buffers, but when we have a
> > receiver
> > > > > below the sort, most of (if not all) buffers are child buffers.
> This
> > > may
> > > > > delay spilling, and increase the memory usage of the drillbit. If
> my
> > > > > computations are correct, for a single query, one drillbit can
> > allocate
> > > > up
> > > > > to 40GB without spilling once to disk.
> > > > >
> > > > > Is there a specific reason we only account for root buffers ?
> > > > >
> > > > > --
> > > > >
> > > > > Abdelhakim Deneche
> > > > >
> > > > > Software Engineer
> > > > >
> > > > >   
> > > > >
> > > > >
> > > > > Now Available - Free Hadoop On-Demand Training
> > > > > <
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>


Re: ExternalSort doesn't properly account for sliced buffers

2015-11-20 Thread Abdel Hakim Deneche
I'm confused here. Looking at DrillBuf.capacity() it already returns the
length of the slice:

  @Override
>   public int capacity() {
> return length;
>   }




On Fri, Nov 20, 2015 at 4:11 PM, Hanifi GUNES  wrote:

> Seems like I am missing some input here. Are we talking about changing the
> behavior of DrillBuf#capacity()? If so, I would +1 the following:
>
> *- It seems like capacity should return the length of the slice (since
> I believe that fits with the general ByteBuf interface). If I
> reset writerIndex(0), I should be able write to capacity() without issue.*
>
> 2015-11-20 16:03 GMT-08:00 Jacques Nadeau :
>
> > My gut is that any changes other than correct accounting will have little
> > impact on real world situations. Do you think the change will make enough
> > difference to be valuable?
> >
> > It seems like capacity should return the length of the slice (since I
> > believe that fits with the general ByteBuf interface). If I reset
> > writerIndex(0), I should be able write to capacity() without issue. Seems
> > weird to return some other (mostly disconnect) value.
> >
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Fri, Nov 20, 2015 at 3:28 PM, Abdel Hakim Deneche <
> > adene...@maprtech.com>
> > wrote:
> >
> > > @Steven, I think DrillBuf.capacity() was changed at some point, I was
> > > looking at the code and it seems to only return the size of the "slice"
> > and
> > > not the root buffer.
> > >
> > > While waiting for the new allocator and transfer of ownership, would it
> > > make sense to remove the check for root buffers like this ?
> > >
> > >   private long getBufferSize(VectorAccessible batch) {
> > > > long size = 0;
> > > > for (VectorWrapper w : batch) {
> > > >   DrillBuf[] bufs = w.getValueVector().getBuffers(false);
> > > >   for (DrillBuf buf : bufs) {
> > > > size += buf.capacity();
> > > >   }
> > > > }
> > > > return size;
> > > >   }
> > >
> > >
> > >
> > > On Fri, Nov 20, 2015 at 2:50 PM, Hanifi GUNES 
> > > wrote:
> > >
> > > > Problem with the above code is that not all vectors operate on root
> > > buffers
> > > > rendering the accounting above inaccurate. In fact your example is
> one
> > > > perfect instance where vectors would point to non-root buffers for
> sure
> > > > because of the slicing taking place at RecordBatchLoader#load [1]
> > > >
> > > >
> > > > 1:
> > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchLoader.java#L117
> > > >
> > > > 2015-11-20 11:41 GMT-08:00 Steven Phillips :
> > > >
> > > > > I think it is because we can't actually properly account for sliced
> > > > > buffers. I don't remember for sure, but I think it might be because
> > > > calling
> > > > > buf.capacity() on a sliced buffer returns the the capacity of root
> > > > buffer,
> > > > > not the size of the slice. That may not be correct, but I think it
> > was
> > > > > something like that. Whatever it is, I am pretty sure it was giving
> > > wrong
> > > > > results when they are sliced buffers.
> > > > >
> > > > > I think we need to get the new allocator, along with proper
> transfer
> > of
> > > > > ownership in order to do this correctly. Then we can just query the
> > > > > allocator rather than trying to track it separately.
> > > > >
> > > > > On Fri, Nov 20, 2015 at 11:25 AM, Abdel Hakim Deneche <
> > > > > adene...@maprtech.com
> > > > > > wrote:
> > > > >
> > > > > > I'm looking at the external sort code and it uses the following
> > > method
> > > > to
> > > > > > compute the allocated size of a batch:
> > > > > >
> > > > > >   private long getBufferSize(VectorAccessible batch) {
> > > > > > > long size = 0;
> > > > > > > for (VectorWrapper w : batch) {
> > > > > > >   DrillBuf[] bufs = w.getValueVector().getBuffers(false);
> > > > > > >   for (DrillBuf buf : bufs) {
> > > > > > > if (*buf.isRootBuffer()*) {
> > > > > > >   size += buf.capacity();
> > > > > > > }
> > > > > > >   }
> > > > > > > }
> > > > > > > return size;
> > > > > > >   }
> > > > > >
> > > > > >
> > > > > > This method only accounts for root buffers, but when we have a
> > > receiver
> > > > > > below the sort, most of (if not all) buffers are child buffers.
> > This
> > > > may
> > > > > > delay spilling, and increase the memory usage of the drillbit. If
> > my
> > > > > > computations are correct, for a single query, one drillbit can
> > > allocate
> > > > > up
> > > > > > to 40GB without spilling once to disk.
> > > > > >
> > > > > > Is there a specific reason we only account for root buffers ?
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Abdelhakim Deneche
> > > > > >
> > > > > > Software Engineer
> > > > > >
> > > > > >   
> > > > > >
> > > > > >
> > > > > > Now Available - Free Hadoop On-Demand Training
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> http:

Re: ExternalSort doesn't properly account for sliced buffers

2015-11-20 Thread Hanifi Gunes
@Abdel back to your question

*Is there a specific reason we only account for root buffers ?*

My understanding was that we'd like to avoid double counting as slices are
zero copy at any point you'd either account root buffer or slices but not
both together. That being said, however, I do not see any reason why a
vector would internally slice its underlying buffer except an invocation to
VV#load in which case the vector is given a non-root buffer as well. So
making this check on root buffer-ness seems unneeded to me since
VV#getBuffers() should never return a root buffer and a slice from the same
buffer. I would like to be proven wrong though.


I guess, perhaps, Steven is referring to previous implementation of
DrillBuf but returning slice width rather than root buffer width seems
conforming to ByteBuf#capacity(), which is the current code anyway.

On Fri, Nov 20, 2015 at 4:26 PM, Abdel Hakim Deneche 
wrote:

> I'm confused here. Looking at DrillBuf.capacity() it already returns the
> length of the slice:
>
>   @Override
> >   public int capacity() {
> > return length;
> >   }
>
>
>
>
> On Fri, Nov 20, 2015 at 4:11 PM, Hanifi GUNES 
> wrote:
>
> > Seems like I am missing some input here. Are we talking about changing
> the
> > behavior of DrillBuf#capacity()? If so, I would +1 the following:
> >
> > *- It seems like capacity should return the length of the slice (since
> > I believe that fits with the general ByteBuf interface). If I
> > reset writerIndex(0), I should be able write to capacity() without
> issue.*
> >
> > 2015-11-20 16:03 GMT-08:00 Jacques Nadeau :
> >
> > > My gut is that any changes other than correct accounting will have
> little
> > > impact on real world situations. Do you think the change will make
> enough
> > > difference to be valuable?
> > >
> > > It seems like capacity should return the length of the slice (since I
> > > believe that fits with the general ByteBuf interface). If I reset
> > > writerIndex(0), I should be able write to capacity() without issue.
> Seems
> > > weird to return some other (mostly disconnect) value.
> > >
> > >
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Fri, Nov 20, 2015 at 3:28 PM, Abdel Hakim Deneche <
> > > adene...@maprtech.com>
> > > wrote:
> > >
> > > > @Steven, I think DrillBuf.capacity() was changed at some point, I was
> > > > looking at the code and it seems to only return the size of the
> "slice"
> > > and
> > > > not the root buffer.
> > > >
> > > > While waiting for the new allocator and transfer of ownership, would
> it
> > > > make sense to remove the check for root buffers like this ?
> > > >
> > > >   private long getBufferSize(VectorAccessible batch) {
> > > > > long size = 0;
> > > > > for (VectorWrapper w : batch) {
> > > > >   DrillBuf[] bufs = w.getValueVector().getBuffers(false);
> > > > >   for (DrillBuf buf : bufs) {
> > > > > size += buf.capacity();
> > > > >   }
> > > > > }
> > > > > return size;
> > > > >   }
> > > >
> > > >
> > > >
> > > > On Fri, Nov 20, 2015 at 2:50 PM, Hanifi GUNES  >
> > > > wrote:
> > > >
> > > > > Problem with the above code is that not all vectors operate on root
> > > > buffers
> > > > > rendering the accounting above inaccurate. In fact your example is
> > one
> > > > > perfect instance where vectors would point to non-root buffers for
> > sure
> > > > > because of the slicing taking place at RecordBatchLoader#load [1]
> > > > >
> > > > >
> > > > > 1:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchLoader.java#L117
> > > > >
> > > > > 2015-11-20 11:41 GMT-08:00 Steven Phillips :
> > > > >
> > > > > > I think it is because we can't actually properly account for
> sliced
> > > > > > buffers. I don't remember for sure, but I think it might be
> because
> > > > > calling
> > > > > > buf.capacity() on a sliced buffer returns the the capacity of
> root
> > > > > buffer,
> > > > > > not the size of the slice. That may not be correct, but I think
> it
> > > was
> > > > > > something like that. Whatever it is, I am pretty sure it was
> giving
> > > > wrong
> > > > > > results when they are sliced buffers.
> > > > > >
> > > > > > I think we need to get the new allocator, along with proper
> > transfer
> > > of
> > > > > > ownership in order to do this correctly. Then we can just query
> the
> > > > > > allocator rather than trying to track it separately.
> > > > > >
> > > > > > On Fri, Nov 20, 2015 at 11:25 AM, Abdel Hakim Deneche <
> > > > > > adene...@maprtech.com
> > > > > > > wrote:
> > > > > >
> > > > > > > I'm looking at the external sort code and it uses the following
> > > > method
> > > > > to
> > > > > > > compute the allocated size of a batch:
> > > > > > >
> > > > > > >   private long getBufferSize(VectorAccessible batch) {
> > > > > > > > long size = 0;
> > > > > > > > for (VectorWrapper w : batch) {
> > 

[GitHub] drill pull request: DRILL-3854: For convert_from, re-point readerI...

2015-11-20 Thread jacques-n
Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/262#discussion_r45541174
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/memory/TestAllocators.java 
---
@@ -63,6 +69,53 @@
   private final static String planFile="/physical_allocator_test.json";
 
   @Test
+  public void testDrillBufReadIndex() throws Exception {
+final int length = 10;
+
+final Properties props = new Properties() {
+  {
+put(TopLevelAllocator.TOP_LEVEL_MAX_ALLOC, "100");
+put(TopLevelAllocator.ERROR_ON_MEMORY_LEAK, "true");
+  }
+};
+final DrillConfig config = DrillConfig.create(props);
+
+final BufferAllocator allc = RootAllocatorFactory.newRoot(config);
+final TypeProtos.MajorType.Builder builder = 
TypeProtos.MajorType.newBuilder();
+builder.setMinorType(TypeProtos.MinorType.INT);
+builder.setMode(TypeProtos.DataMode.REQUIRED);
+
+final IntVector iv = new IntVector(MaterializedField.create("Field", 
builder.build()), allc);
+iv.allocateNew();
+
+// Write data to DrillBuf
+for(int i = 0; i < length; ++i) {
+  iv.getBuffer().writeInt(i);
+}
+
+// Read data to DrillBuf
+for(int i = 0; i < length; ++i) {
+  assertEquals((long) i, (long) iv.getBuffer().readInt());
+}
+
+for(DrillBuf drillBuf : iv.getBuffers(false)) {
+  assertEquals((long) 0, drillBuf.readInt());
+}
+
+final List toBeClean = Lists.newArrayList();
+for(DrillBuf drillBuf : iv.getBuffers(true)) {
+  assertEquals((long) 0, drillBuf.readInt());
--- End diff --

you can use 0L rather than (long) 0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3854: For convert_from, re-point readerI...

2015-11-20 Thread jacques-n
Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/262#discussion_r45541175
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/memory/TestAllocators.java 
---
@@ -63,6 +69,53 @@
   private final static String planFile="/physical_allocator_test.json";
 
   @Test
+  public void testDrillBufReadIndex() throws Exception {
+final int length = 10;
+
+final Properties props = new Properties() {
+  {
+put(TopLevelAllocator.TOP_LEVEL_MAX_ALLOC, "100");
+put(TopLevelAllocator.ERROR_ON_MEMORY_LEAK, "true");
+  }
+};
+final DrillConfig config = DrillConfig.create(props);
+
+final BufferAllocator allc = RootAllocatorFactory.newRoot(config);
+final TypeProtos.MajorType.Builder builder = 
TypeProtos.MajorType.newBuilder();
+builder.setMinorType(TypeProtos.MinorType.INT);
+builder.setMode(TypeProtos.DataMode.REQUIRED);
+
+final IntVector iv = new IntVector(MaterializedField.create("Field", 
builder.build()), allc);
+iv.allocateNew();
+
+// Write data to DrillBuf
+for(int i = 0; i < length; ++i) {
+  iv.getBuffer().writeInt(i);
+}
+
+// Read data to DrillBuf
+for(int i = 0; i < length; ++i) {
+  assertEquals((long) i, (long) iv.getBuffer().readInt());
--- End diff --

why are you casting here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3854: For convert_from, re-point readerI...

2015-11-20 Thread jacques-n
Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/262#issuecomment-158586257
  
A few small comments. Otherwise, looks good. Thanks for the updates!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3854: For convert_from, re-point readerI...

2015-11-20 Thread jacques-n
Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/262#discussion_r45541183
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/memory/TestAllocators.java 
---
@@ -63,6 +69,53 @@
   private final static String planFile="/physical_allocator_test.json";
 
   @Test
+  public void testDrillBufReadIndex() throws Exception {
--- End diff --

Maybe name ensureDrillBufReadIndexIsZero or add a new javadoc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3854: For convert_from, re-point readerI...

2015-11-20 Thread hsuanyi
Github user hsuanyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/262#discussion_r45541993
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/memory/TestAllocators.java 
---
@@ -63,6 +69,53 @@
   private final static String planFile="/physical_allocator_test.json";
 
   @Test
+  public void testDrillBufReadIndex() throws Exception {
+final int length = 10;
+
+final Properties props = new Properties() {
+  {
+put(TopLevelAllocator.TOP_LEVEL_MAX_ALLOC, "100");
+put(TopLevelAllocator.ERROR_ON_MEMORY_LEAK, "true");
+  }
+};
+final DrillConfig config = DrillConfig.create(props);
+
+final BufferAllocator allc = RootAllocatorFactory.newRoot(config);
+final TypeProtos.MajorType.Builder builder = 
TypeProtos.MajorType.newBuilder();
+builder.setMinorType(TypeProtos.MinorType.INT);
+builder.setMode(TypeProtos.DataMode.REQUIRED);
+
+final IntVector iv = new IntVector(MaterializedField.create("Field", 
builder.build()), allc);
+iv.allocateNew();
+
+// Write data to DrillBuf
+for(int i = 0; i < length; ++i) {
+  iv.getBuffer().writeInt(i);
+}
+
+// Read data to DrillBuf
+for(int i = 0; i < length; ++i) {
+  assertEquals((long) i, (long) iv.getBuffer().readInt());
+}
+
+for(DrillBuf drillBuf : iv.getBuffers(false)) {
+  assertEquals((long) 0, drillBuf.readInt());
+}
+
+final List toBeClean = Lists.newArrayList();
+for(DrillBuf drillBuf : iv.getBuffers(true)) {
+  assertEquals((long) 0, drillBuf.readInt());
--- End diff --

Just cleaned it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3854: For convert_from, re-point readerI...

2015-11-20 Thread hsuanyi
Github user hsuanyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/262#discussion_r45541995
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/memory/TestAllocators.java 
---
@@ -63,6 +69,53 @@
   private final static String planFile="/physical_allocator_test.json";
 
   @Test
+  public void testDrillBufReadIndex() throws Exception {
+final int length = 10;
+
+final Properties props = new Properties() {
+  {
+put(TopLevelAllocator.TOP_LEVEL_MAX_ALLOC, "100");
+put(TopLevelAllocator.ERROR_ON_MEMORY_LEAK, "true");
+  }
+};
+final DrillConfig config = DrillConfig.create(props);
+
+final BufferAllocator allc = RootAllocatorFactory.newRoot(config);
+final TypeProtos.MajorType.Builder builder = 
TypeProtos.MajorType.newBuilder();
+builder.setMinorType(TypeProtos.MinorType.INT);
+builder.setMode(TypeProtos.DataMode.REQUIRED);
+
+final IntVector iv = new IntVector(MaterializedField.create("Field", 
builder.build()), allc);
+iv.allocateNew();
+
+// Write data to DrillBuf
+for(int i = 0; i < length; ++i) {
+  iv.getBuffer().writeInt(i);
+}
+
+// Read data to DrillBuf
+for(int i = 0; i < length; ++i) {
+  assertEquals((long) i, (long) iv.getBuffer().readInt());
--- End diff --

Yes, It is not necessary. Just cleaned it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3854: For convert_from, re-point readerI...

2015-11-20 Thread hsuanyi
Github user hsuanyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/262#discussion_r45542121
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/memory/TestAllocators.java 
---
@@ -63,6 +69,53 @@
   private final static String planFile="/physical_allocator_test.json";
 
   @Test
+  public void testDrillBufReadIndex() throws Exception {
--- End diff --

Changed the method name and added a doc to describe the scenario where the 
failing cases would occur.

Thanks for reviewing. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[RESULT][VOTE] Release Apache Drill 1.3.0 (rc3)

2015-11-20 Thread Jacques Nadeau
Vote passes. Thanks everyone for your time voting. Final Tally:

3 x +1 (binding)
Jacques
Tomer
Aman

9 x +1 (non-binding)
Abhijit
Stefan
Norris
Khurram
Hsuan
Julien
Abhishek
Amit
Sudheesh

No +0 or -1's

I'll push the release artifacts and send an announcement once propagated.

thanks everyone!

Jacques

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Fri, Nov 20, 2015 at 12:56 PM, Aman Sinha  wrote:

> +1  (binding)
>
> -Downloaded source and built on Mac
> -Ran unit tests
> -Ran manual tests against parquet partitioned data with and without
> metadata cache
> -Examined Explain plans for a few queries with partition filters on BigInt
> columns, verified partition pruning was working
> -Examined query profiles in Web UI
>
>
>
>
> On Thu, Nov 19, 2015 at 9:36 PM, Sudheesh Katkam 
> wrote:
>
> > +1 (non-binding)
> >
> > * ran queries (including cancellations) in embedded mode using the binary
> > tarball on Ubuntu; verified states in web UI
> > * built (including tests) from the source tarball on Ubuntu
> > * verified checksums on the three tarball artifacts
> >
> > One minor nit:
> > > select * from sys.version;
> >
> >
> +--++-+--+--+-+
> > | version  | commit_id  | commit_message  | commit_time  | build_email  |
> > build_time  |
> >
> >
> +--++-+--+--+-+
> > | 1.3.0| Unknown| |  | Unknown  |
> >|
> >
> >
> +--++-+--+--+-+
> >
> > However, the git.properties files has the right information.
> >
> > Also, it would be nice to get rid of this message when starting up in
> > embedded mode. It does not happen in distributed mode.
> > Nov 20, 2015 5:10:39 AM org.glassfish.jersey.server.ApplicationHandler
> > initialize
> > INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29
> > 01:25:26..
> >
> > Thanks,
> > Sudheesh
> >
> > > On Nov 19, 2015, at 2:25 PM, Julien Le Dem  wrote:
> > >
> > > +1 (non binding)
> > > built from source and ran the tests on linux
> > >
> > > On Thu, Nov 19, 2015 at 2:22 PM, Hsuan Yi Chu 
> > wrote:
> > >
> > >> (+1 no binding)
> > >> Unit test on Mac
> > >>
> > >> Queries on Distributed mode. Things look fairly good.
> > >>
> > >>
> > >> On Thursday, November 19, 2015, Khurram Faraaz 
> > >> wrote:
> > >>
> > >>> (+1 non-binding)
> > >>>
> > >>> Built from source and ran unit tests on CentOS.
> > >>>
> > >>> On Thu, Nov 19, 2015 at 1:34 PM, Norris Lee  > >>> > wrote:
> > >>>
> >  +1 (non-binding)
> > 
> >  Built from source on Linux.  Tested with ODBC driver against
> multiple
> > >>> data
> >  formats (Hive, JSON, CSV, Parquet, TSV, Avro).
> > 
> >  Norris
> > 
> >  -Original Message-
> >  From: Jacques Nadeau [mailto:jacq...@dremio.com ]
> >  Sent: Tuesday, November 17, 2015 10:27 PM
> >  To: dev >
> >  Subject: [VOTE] Release Apache Drill 1.3.0 (rc3)
> > 
> >  Hey Everybody,
> > 
> >  I'd like to propose a new release candidate of Apache Drill, version
> >  1.3.0.  This is the fourth release candidate (rc3).  This addresses
> > >> some
> >  issues identified in the the third release candidate including some
> > >>> issues
> >  with missing S3 dependencies, Avro deserialization issues and
> Parquet
> > >>> file
> >  writer metadata additions. Note that this doesn't directly address
> >  DRILL-4070, an issue that sunk the last candidate. Based on
> additional
> >  conversations on the JIRA, the plan is provide a separate migration
> > >> tool
> >  for people to rewrite their Parquet footers.
> > 
> >  The tarball artifacts are hosted at [2] and the maven artifacts are
> > >>> hosted
> >  at [3]. This release candidate is based on commit
> >  cc127ff4ac6272d2cb1b602890c0b7c503ea2062 located at [4].
> > 
> >  The vote will be open for 72 hours ending at 10PM Pacific, November
> > 20,
> >  2015.
> > 
> >  [ ] +1
> >  [ ] +0
> >  [ ] -1
> > 
> >  thanks,
> >  Jacques
> > 
> >  [1]
> > 
> > 
> > >>>
> > >>
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12332946
> >  [2]http://people.apache.org/~jacques/apache-drill-1.3.0.rc3/
> >  [3]
> > 
> > >>
> https://repository.apache.org/content/repositories/orgapachedrill-1016/
> >  [4] https://github.com/jacques-n/drill/tree/drill-1.3.0
> > 
> > >>>
> > >>
> > >
> > >
> > >
> > > --
> > > Julien
> >
> >
>


[GitHub] drill pull request: DRILL-3854: For convert_from, re-point readerI...

2015-11-20 Thread jacques-n
Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/262#issuecomment-158595453
  
lgtm +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---