[jira] [Commented] (DRILL-5542) Scan unnecessary adds implicit columns to ScanRecordBatch for select * query

2017-05-25 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025745#comment-16025745
 ] 

Paul Rogers commented on DRILL-5542:


Thanks for tracking this down!

I wonder, how does the downstream operator know to remove the implicit columns? 
There is nothing in the column name or (it seems) physical plan to identify 
those columns as implicit. In the example for CSV, say, how would the 
downstream know that "columns" is OK, but "fqn" is not? Is this hard-coded 
somewhere?

If hardcoded, how does it know to pass along the "fqn" when it is requested?

In any event, for the readers that DRILL-5211 touches, I will address the issue 
in the revised scan batch code. Others will need attention by others.

> Scan unnecessary adds implicit columns to ScanRecordBatch for select * query
> 
>
> Key: DRILL-5542
> URL: https://issues.apache.org/jira/browse/DRILL-5542
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Jinfeng Ni
>
> It seems that Drill would add several implicit columns (`fqn`, `filepath`, 
> `filename`, `suffix`) to ScanBatch, where it's actually not required at 
> downstream operator. Although those implicit columns would be dropped off 
> later on, it increases both memory and CPU overhead.
> 1. JSON
> ```
> {a: 100}
> ```
> {code}
> select * from dfs.tmp.`1.json`;
> +--+
> |  a   |
> +--+
> | 100  |
> +--+
> {code}
> The schema from ScanRecordBatch is :
> {code}
> [ schema:
> BatchSchema [fields=[fqn(VARCHAR:OPTIONAL), filepath(VARCHAR:OPTIONAL), 
> filename(VARCHAR:OPTIONAL), suffix(VARCHAR:OPTIONAL), a(BIGINT:OPTIONAL)], 
> selectionVector=NONE], 
>  {code}
> 2. Parquet
> {code}
> elect * from cp.`tpch/nation.parquet`;
> +--+-+--+-+
> | n_nationkey  | n_name  | n_regionkey  | 
>  n_comment
>   |
> +--+-+--+-+
> | 0| ALGERIA | 0|  haggle. carefully final 
> deposits detect slyly agai
>  |
> ...
> {code}
> The schema of ScanRecordBatch:
> {code}
>   schema:
> BatchSchema [fields=[n_nationkey(INT:REQUIRED), n_name(VARCHAR:REQUIRED), 
> n_regionkey(INT:REQUIRED), n_comment(VARCHAR:REQUIRED), 
> fqn(VARCHAR:OPTIONAL), filepath(VARCHAR:OPTIONAL), 
> filename(VARCHAR:OPTIONAL), suffix(VARCHAR:OPTIONAL)], selectionVector=NONE], 
> {code}
> 3. Text
> {code}
> cat 1.csv
> a, b, c
> select * from dfs.tmp.`1.csv`;
> ++
> |columns |
> ++
> | ["a","b","c"]  |
> ++
> {code}
> Schema of ScanRecordBatch 
> {code}
>   schema:
> BatchSchema [fields=[columns(VARCHAR:REPEATED)[$data$(VARCHAR:REQUIRED)], 
> fqn(VARCHAR:OPTIONAL), filepath(VARCHAR:OPTIONAL), 
> filename(VARCHAR:OPTIONAL), suffix(VARCHAR:OPTIONAL)], selectionVector=NONE], 
> {code}
> If implicit columns are not part of query result of `select * query`, then 
> Scan operator should not populate those implicit columns.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5432) Want a memory format for PCAP files

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025664#comment-16025664
 ] 

ASF GitHub Bot commented on DRILL-5432:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/831#discussion_r118614384
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapDrillTable.java
 ---
@@ -0,0 +1,73 @@
+/*
--- End diff --

Would be very helpful if this PR can include a package-info.java file to 
describe this work. For example, what is pcap? Links to good sources? What 
features of Drill does it use (push-downs)? Etc.


> Want a memory format for PCAP files
> ---
>
> Key: DRILL-5432
> URL: https://issues.apache.org/jira/browse/DRILL-5432
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Ted Dunning
>
> PCAP files [1] are the de facto standard for storing network capture data. In 
> security and protocol applications, it is very common to want to extract 
> particular packets from a capture for further analysis.
> At a first level, it is desirable to query and filter by source and 
> destination IP and port or by protocol. Beyond that, however, it would be 
> very useful to be able to group packets by TCP session and eventually to look 
> at packet contents. For now, however, the most critical requirement is that 
> we should be able to scan captures at very high speed.
> I previously wrote a (kind of working) proof of concept for a PCAP decoder 
> that did lazy deserialization and could traverse hundreds of MB of PCAP data 
> per second per core. This compares to roughly 2-3 MB/s for widely available 
> Apache-compatible open source PCAP decoders.
> This JIRA covers the integration and extension of that proof of concept as a 
> Drill file format.
> Initial work is available at https://github.com/mapr-demos/drill-pcap-format
> [1] https://en.wikipedia.org/wiki/Pcap



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5432) Want a memory format for PCAP files

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025656#comment-16025656
 ] 

ASF GitHub Bot commented on DRILL-5432:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/831#discussion_r118616554
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapRecordReader.java
 ---
@@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.pcap.decoder.Packet;
+import org.apache.drill.exec.store.pcap.decoder.PacketDecoder;
+import org.apache.drill.exec.store.pcap.dto.ColumnDto;
+import org.apache.drill.exec.store.pcap.schema.PcapTypes;
+import org.apache.drill.exec.store.pcap.schema.Schema;
+import org.apache.drill.exec.vector.NullableBigIntVector;
+import org.apache.drill.exec.vector.NullableIntVector;
+import org.apache.drill.exec.vector.NullableTimeStampVector;
+import org.apache.drill.exec.vector.NullableVarCharVector;
+import org.apache.drill.exec.vector.ValueVector;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.util.List;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+import static org.apache.drill.exec.store.pcap.Utils.parseBytesToASCII;
+
+public class PcapRecordReader extends AbstractRecordReader {
+
+  private OutputMutator output;
+
+  private final PacketDecoder decoder;
+  private ImmutableList projectedCols;
+
+  private byte[] buffer = new byte[10];
+  private int offset = 0;
+  private InputStream in;
+  private int validBytes;
+
+  private static final Map TYPES;
+
+  private static class ProjectedColumnInfo {
+ValueVector vv;
+ColumnDto pcapColumn;
+  }
+
+  static {
+TYPES = ImmutableMap.builder()
+.put(PcapTypes.STRING, MinorType.VARCHAR)
+.put(PcapTypes.INTEGER, MinorType.INT)
+.put(PcapTypes.LONG, MinorType.BIGINT)
+.put(PcapTypes.TIMESTAMP, MinorType.TIMESTAMP)
+.build();
+  }
+
+  public PcapRecordReader(final String inputPath,
+  final List projectedColumns) {
+try {
+  this.in = new FileInputStream(inputPath);
+  this.decoder = getPacketDecoder();
+  validBytes = in.read(buffer);
+} catch (IOException e) {
+  throw new RuntimeException("File " + inputPath + " not Found");
+}
+setColumns(projectedColumns);
+  }
+
+  @Override
+  public void setup(final OperatorContext context, final OutputMutator 
output) throws ExecutionSetupException {
+this.output = output;
+  }
+
+  @Override
+  public int next() {
+projectedCols = getProjectedColsIfItNull();
+try {
+  return parsePcapFilesAndPutItToTable();
+} catch (IOException io) {
+  throw new RuntimeException("Trouble with reading packets in file!");
+}

[jira] [Commented] (DRILL-5432) Want a memory format for PCAP files

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025663#comment-16025663
 ] 

ASF GitHub Bot commented on DRILL-5432:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/831#discussion_r118619851
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/Packet.java
 ---
@@ -0,0 +1,371 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap.decoder;
+
+import com.google.common.base.Preconditions;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.InetAddress;
+import java.net.UnknownHostException;
+
+import static org.apache.drill.exec.store.pcap.Utils.convertInt;
+import static org.apache.drill.exec.store.pcap.Utils.convertShort;
+import static org.apache.drill.exec.store.pcap.Utils.getByte;
+import static org.apache.drill.exec.store.pcap.Utils.getIntFileOrder;
+import static org.apache.drill.exec.store.pcap.Utils.getShort;
+
+public class Packet {
--- End diff --

Would it have been possible to use one of the existing pcap Java libraries 
here? Four are listed [here](https://en.wikipedia.org/wiki/Pcap).


> Want a memory format for PCAP files
> ---
>
> Key: DRILL-5432
> URL: https://issues.apache.org/jira/browse/DRILL-5432
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Ted Dunning
>
> PCAP files [1] are the de facto standard for storing network capture data. In 
> security and protocol applications, it is very common to want to extract 
> particular packets from a capture for further analysis.
> At a first level, it is desirable to query and filter by source and 
> destination IP and port or by protocol. Beyond that, however, it would be 
> very useful to be able to group packets by TCP session and eventually to look 
> at packet contents. For now, however, the most critical requirement is that 
> we should be able to scan captures at very high speed.
> I previously wrote a (kind of working) proof of concept for a PCAP decoder 
> that did lazy deserialization and could traverse hundreds of MB of PCAP data 
> per second per core. This compares to roughly 2-3 MB/s for widely available 
> Apache-compatible open source PCAP decoders.
> This JIRA covers the integration and extension of that proof of concept as a 
> Drill file format.
> Initial work is available at https://github.com/mapr-demos/drill-pcap-format
> [1] https://en.wikipedia.org/wiki/Pcap



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5432) Want a memory format for PCAP files

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025667#comment-16025667
 ] 

ASF GitHub Bot commented on DRILL-5432:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/831#discussion_r118620276
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapRecordReader.java
 ---
@@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.pcap.decoder.Packet;
+import org.apache.drill.exec.store.pcap.decoder.PacketDecoder;
+import org.apache.drill.exec.store.pcap.dto.ColumnDto;
+import org.apache.drill.exec.store.pcap.schema.PcapTypes;
+import org.apache.drill.exec.store.pcap.schema.Schema;
+import org.apache.drill.exec.vector.NullableBigIntVector;
+import org.apache.drill.exec.vector.NullableIntVector;
+import org.apache.drill.exec.vector.NullableTimeStampVector;
+import org.apache.drill.exec.vector.NullableVarCharVector;
+import org.apache.drill.exec.vector.ValueVector;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.util.List;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+import static org.apache.drill.exec.store.pcap.Utils.parseBytesToASCII;
+
+public class PcapRecordReader extends AbstractRecordReader {
+
+  private OutputMutator output;
+
+  private final PacketDecoder decoder;
+  private ImmutableList projectedCols;
+
+  private byte[] buffer = new byte[10];
+  private int offset = 0;
+  private InputStream in;
+  private int validBytes;
+
+  private static final Map TYPES;
+
+  private static class ProjectedColumnInfo {
+ValueVector vv;
+ColumnDto pcapColumn;
+  }
+
+  static {
+TYPES = ImmutableMap.builder()
+.put(PcapTypes.STRING, MinorType.VARCHAR)
+.put(PcapTypes.INTEGER, MinorType.INT)
+.put(PcapTypes.LONG, MinorType.BIGINT)
+.put(PcapTypes.TIMESTAMP, MinorType.TIMESTAMP)
+.build();
+  }
+
+  public PcapRecordReader(final String inputPath,
+  final List projectedColumns) {
+try {
+  this.in = new FileInputStream(inputPath);
+  this.decoder = getPacketDecoder();
+  validBytes = in.read(buffer);
+} catch (IOException e) {
+  throw new RuntimeException("File " + inputPath + " not Found");
+}
+setColumns(projectedColumns);
+  }
+
+  @Override
+  public void setup(final OperatorContext context, final OutputMutator 
output) throws ExecutionSetupException {
+this.output = output;
+  }
+
+  @Override
+  public int next() {
+projectedCols = getProjectedColsIfItNull();
+try {
+  return parsePcapFilesAndPutItToTable();
+} catch (IOException io) {
+  throw new RuntimeException("Trouble with reading packets in file!");
+}

[jira] [Commented] (DRILL-5432) Want a memory format for PCAP files

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025661#comment-16025661
 ] 

ASF GitHub Bot commented on DRILL-5432:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/831#discussion_r118616240
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapRecordReader.java
 ---
@@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.pcap.decoder.Packet;
+import org.apache.drill.exec.store.pcap.decoder.PacketDecoder;
+import org.apache.drill.exec.store.pcap.dto.ColumnDto;
+import org.apache.drill.exec.store.pcap.schema.PcapTypes;
+import org.apache.drill.exec.store.pcap.schema.Schema;
+import org.apache.drill.exec.vector.NullableBigIntVector;
+import org.apache.drill.exec.vector.NullableIntVector;
+import org.apache.drill.exec.vector.NullableTimeStampVector;
+import org.apache.drill.exec.vector.NullableVarCharVector;
+import org.apache.drill.exec.vector.ValueVector;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.util.List;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+import static org.apache.drill.exec.store.pcap.Utils.parseBytesToASCII;
+
+public class PcapRecordReader extends AbstractRecordReader {
+
+  private OutputMutator output;
+
+  private final PacketDecoder decoder;
+  private ImmutableList projectedCols;
+
+  private byte[] buffer = new byte[10];
--- End diff --

Do you want to do this at construct time? If you scan 1000 pcap files in a 
single fragment, Drill will create 1000 record readers at the start of 
execution. Each will allocate a 100K buffer. You'll have 100MB of heap in 
buffers, of which only one will ever be used.

Suggestion: allocate the buffer in setup, clear it in close, so that only 
one buffer is used per fragment.


> Want a memory format for PCAP files
> ---
>
> Key: DRILL-5432
> URL: https://issues.apache.org/jira/browse/DRILL-5432
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Ted Dunning
>
> PCAP files [1] are the de facto standard for storing network capture data. In 
> security and protocol applications, it is very common to want to extract 
> particular packets from a capture for further analysis.
> At a first level, it is desirable to query and filter by source and 
> destination IP and port or by protocol. Beyond that, however, it would be 
> very useful to be able to group packets by TCP session and eventually to look 
> at packet contents. For now, however, the most critical requirement is that 
> we should be able to scan captures at very high speed.
> I previously wrote a (kind of working) proof of concept for a PCAP decoder 
> that did lazy deserialization and could traverse hundreds of MB of PCAP data 
> per second per core. This compares to roughly 2-3 M

[jira] [Commented] (DRILL-5432) Want a memory format for PCAP files

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025665#comment-16025665
 ] 

ASF GitHub Bot commented on DRILL-5432:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/831#discussion_r118619502
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapRecordReader.java
 ---
@@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.pcap.decoder.Packet;
+import org.apache.drill.exec.store.pcap.decoder.PacketDecoder;
+import org.apache.drill.exec.store.pcap.dto.ColumnDto;
+import org.apache.drill.exec.store.pcap.schema.PcapTypes;
+import org.apache.drill.exec.store.pcap.schema.Schema;
+import org.apache.drill.exec.vector.NullableBigIntVector;
+import org.apache.drill.exec.vector.NullableIntVector;
+import org.apache.drill.exec.vector.NullableTimeStampVector;
+import org.apache.drill.exec.vector.NullableVarCharVector;
+import org.apache.drill.exec.vector.ValueVector;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.util.List;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+import static org.apache.drill.exec.store.pcap.Utils.parseBytesToASCII;
+
+public class PcapRecordReader extends AbstractRecordReader {
+
+  private OutputMutator output;
+
+  private final PacketDecoder decoder;
+  private ImmutableList projectedCols;
+
+  private byte[] buffer = new byte[10];
+  private int offset = 0;
+  private InputStream in;
+  private int validBytes;
+
+  private static final Map TYPES;
+
+  private static class ProjectedColumnInfo {
+ValueVector vv;
+ColumnDto pcapColumn;
+  }
+
+  static {
+TYPES = ImmutableMap.builder()
+.put(PcapTypes.STRING, MinorType.VARCHAR)
+.put(PcapTypes.INTEGER, MinorType.INT)
+.put(PcapTypes.LONG, MinorType.BIGINT)
+.put(PcapTypes.TIMESTAMP, MinorType.TIMESTAMP)
+.build();
+  }
+
+  public PcapRecordReader(final String inputPath,
+  final List projectedColumns) {
+try {
+  this.in = new FileInputStream(inputPath);
+  this.decoder = getPacketDecoder();
+  validBytes = in.read(buffer);
+} catch (IOException e) {
+  throw new RuntimeException("File " + inputPath + " not Found");
+}
+setColumns(projectedColumns);
+  }
+
+  @Override
+  public void setup(final OperatorContext context, final OutputMutator 
output) throws ExecutionSetupException {
+this.output = output;
+  }
+
+  @Override
+  public int next() {
+projectedCols = getProjectedColsIfItNull();
+try {
+  return parsePcapFilesAndPutItToTable();
+} catch (IOException io) {
+  throw new RuntimeException("Trouble with reading packets in file!");
+}

[jira] [Commented] (DRILL-5432) Want a memory format for PCAP files

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025658#comment-16025658
 ] 

ASF GitHub Bot commented on DRILL-5432:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/831#discussion_r118616406
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapRecordReader.java
 ---
@@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.pcap.decoder.Packet;
+import org.apache.drill.exec.store.pcap.decoder.PacketDecoder;
+import org.apache.drill.exec.store.pcap.dto.ColumnDto;
+import org.apache.drill.exec.store.pcap.schema.PcapTypes;
+import org.apache.drill.exec.store.pcap.schema.Schema;
+import org.apache.drill.exec.vector.NullableBigIntVector;
+import org.apache.drill.exec.vector.NullableIntVector;
+import org.apache.drill.exec.vector.NullableTimeStampVector;
+import org.apache.drill.exec.vector.NullableVarCharVector;
+import org.apache.drill.exec.vector.ValueVector;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.util.List;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+import static org.apache.drill.exec.store.pcap.Utils.parseBytesToASCII;
+
+public class PcapRecordReader extends AbstractRecordReader {
+
+  private OutputMutator output;
+
+  private final PacketDecoder decoder;
+  private ImmutableList projectedCols;
+
+  private byte[] buffer = new byte[10];
+  private int offset = 0;
+  private InputStream in;
+  private int validBytes;
+
+  private static final Map TYPES;
+
+  private static class ProjectedColumnInfo {
+ValueVector vv;
+ColumnDto pcapColumn;
+  }
+
+  static {
+TYPES = ImmutableMap.builder()
+.put(PcapTypes.STRING, MinorType.VARCHAR)
+.put(PcapTypes.INTEGER, MinorType.INT)
+.put(PcapTypes.LONG, MinorType.BIGINT)
+.put(PcapTypes.TIMESTAMP, MinorType.TIMESTAMP)
+.build();
+  }
+
+  public PcapRecordReader(final String inputPath,
+  final List projectedColumns) {
+try {
+  this.in = new FileInputStream(inputPath);
--- End diff --

As noted above, by opening the file here, if you are scanning 1000 files, 
you'll have 1000 open file handles at the start of the fragment. Better to 
postpone opening files until setup.


> Want a memory format for PCAP files
> ---
>
> Key: DRILL-5432
> URL: https://issues.apache.org/jira/browse/DRILL-5432
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Ted Dunning
>
> PCAP files [1] are the de facto standard for storing network capture data. In 
> security and protocol applications, it is very common to want to extract 
> particular packets from a capture for further analysis.
> At a first

[jira] [Commented] (DRILL-5432) Want a memory format for PCAP files

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025662#comment-16025662
 ] 

ASF GitHub Bot commented on DRILL-5432:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/831#discussion_r118617482
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapRecordReader.java
 ---
@@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.pcap.decoder.Packet;
+import org.apache.drill.exec.store.pcap.decoder.PacketDecoder;
+import org.apache.drill.exec.store.pcap.dto.ColumnDto;
+import org.apache.drill.exec.store.pcap.schema.PcapTypes;
+import org.apache.drill.exec.store.pcap.schema.Schema;
+import org.apache.drill.exec.vector.NullableBigIntVector;
+import org.apache.drill.exec.vector.NullableIntVector;
+import org.apache.drill.exec.vector.NullableTimeStampVector;
+import org.apache.drill.exec.vector.NullableVarCharVector;
+import org.apache.drill.exec.vector.ValueVector;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.util.List;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+import static org.apache.drill.exec.store.pcap.Utils.parseBytesToASCII;
+
+public class PcapRecordReader extends AbstractRecordReader {
+
+  private OutputMutator output;
+
+  private final PacketDecoder decoder;
+  private ImmutableList projectedCols;
+
+  private byte[] buffer = new byte[10];
+  private int offset = 0;
+  private InputStream in;
+  private int validBytes;
+
+  private static final Map TYPES;
+
+  private static class ProjectedColumnInfo {
+ValueVector vv;
+ColumnDto pcapColumn;
+  }
+
+  static {
+TYPES = ImmutableMap.builder()
+.put(PcapTypes.STRING, MinorType.VARCHAR)
+.put(PcapTypes.INTEGER, MinorType.INT)
+.put(PcapTypes.LONG, MinorType.BIGINT)
+.put(PcapTypes.TIMESTAMP, MinorType.TIMESTAMP)
+.build();
+  }
+
+  public PcapRecordReader(final String inputPath,
+  final List projectedColumns) {
+try {
+  this.in = new FileInputStream(inputPath);
+  this.decoder = getPacketDecoder();
+  validBytes = in.read(buffer);
+} catch (IOException e) {
+  throw new RuntimeException("File " + inputPath + " not Found");
+}
+setColumns(projectedColumns);
+  }
+
+  @Override
+  public void setup(final OperatorContext context, final OutputMutator 
output) throws ExecutionSetupException {
+this.output = output;
+  }
+
+  @Override
+  public int next() {
+projectedCols = getProjectedColsIfItNull();
+try {
+  return parsePcapFilesAndPutItToTable();
--- End diff --

Drill has certain protocols that are not entirely obvious, but that are 
needed here. Each call to `ne

[jira] [Commented] (DRILL-5432) Want a memory format for PCAP files

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025655#comment-16025655
 ] 

ASF GitHub Bot commented on DRILL-5432:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/831#discussion_r118615907
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapFormatPlugin.java
 ---
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Lists;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.planner.logical.DrillTable;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.store.RecordReader;
+import org.apache.drill.exec.store.RecordWriter;
+import org.apache.drill.exec.store.dfs.BasicFormatMatcher;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.FileSelection;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.exec.store.dfs.FormatMatcher;
+import org.apache.drill.exec.store.dfs.FormatSelection;
+import org.apache.drill.exec.store.dfs.MagicString;
+import org.apache.drill.exec.store.dfs.NamedFormatPluginConfig;
+import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
+import org.apache.drill.exec.store.dfs.easy.EasyWriter;
+import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.regex.Pattern;
+
+public class PcapFormatPlugin extends EasyFormatPlugin {
+
+  private final PcapFormatMatcher matcher;
+
+  public PcapFormatPlugin(String name, DrillbitContext context, 
Configuration fsConf,
+  StoragePluginConfig storagePluginConfig) {
+this(name, context, fsConf, storagePluginConfig, new 
PcapFormatConfig());
+  }
+
+  public PcapFormatPlugin(String name, DrillbitContext context, 
Configuration fsConf, StoragePluginConfig config, PcapFormatConfig 
formatPluginConfig) {
+super(name, context, fsConf, config, formatPluginConfig, true, false, 
true, false, Lists.newArrayList("pcap"), "pcap");
+this.matcher = new PcapFormatMatcher(this);
+  }
+
+  @Override
+  public boolean supportsPushDown() {
+return true;
+  }
+
+  @Override
+  public RecordReader getRecordReader(FragmentContext context, 
DrillFileSystem dfs, FileWork fileWork, List columns, String 
userName) throws ExecutionSetupException {
+String path = dfs.makeQualified(new 
Path(fileWork.getPath())).toUri().getPath();
+return new PcapRecordReader(path, columns);
+  }
+
+  @Override
+  public RecordWriter getRecordWriter(FragmentContext context, EasyWriter 
writer) throws IOException {
+return null;
+  }
+
+  @Override
+  public int getReaderOperatorType() {
+return 0;
--- End diff --

Seems akward, but it seems that other format plugins add a type to a 
protobuf, then return that here:

```
return CoreOperatorType.JSON_SUB_SCAN_VALUE;
```

And `UserBitShared.proto`:

```
  JSON_SUB_SCAN = 29;
```

The next available number is 37.

This seems rather brittle. Seems we should have a more general solution. 
But, until we do, I'd guess you'll need to add the enum value.

As an alternative, `SequenceFileForamtPlugin` just makes up a number:

```
  public int getReaderOperatorType() {
return 4001;
  }
```


> Want 

[jira] [Commented] (DRILL-5432) Want a memory format for PCAP files

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025666#comment-16025666
 ] 

ASF GitHub Bot commented on DRILL-5432:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/831#discussion_r118618438
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapRecordReader.java
 ---
@@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.pcap.decoder.Packet;
+import org.apache.drill.exec.store.pcap.decoder.PacketDecoder;
+import org.apache.drill.exec.store.pcap.dto.ColumnDto;
+import org.apache.drill.exec.store.pcap.schema.PcapTypes;
+import org.apache.drill.exec.store.pcap.schema.Schema;
+import org.apache.drill.exec.vector.NullableBigIntVector;
+import org.apache.drill.exec.vector.NullableIntVector;
+import org.apache.drill.exec.vector.NullableTimeStampVector;
+import org.apache.drill.exec.vector.NullableVarCharVector;
+import org.apache.drill.exec.vector.ValueVector;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.util.List;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+import static org.apache.drill.exec.store.pcap.Utils.parseBytesToASCII;
+
+public class PcapRecordReader extends AbstractRecordReader {
+
+  private OutputMutator output;
+
+  private final PacketDecoder decoder;
+  private ImmutableList projectedCols;
+
+  private byte[] buffer = new byte[10];
+  private int offset = 0;
+  private InputStream in;
+  private int validBytes;
+
+  private static final Map TYPES;
+
+  private static class ProjectedColumnInfo {
+ValueVector vv;
+ColumnDto pcapColumn;
+  }
+
+  static {
+TYPES = ImmutableMap.builder()
+.put(PcapTypes.STRING, MinorType.VARCHAR)
+.put(PcapTypes.INTEGER, MinorType.INT)
+.put(PcapTypes.LONG, MinorType.BIGINT)
+.put(PcapTypes.TIMESTAMP, MinorType.TIMESTAMP)
+.build();
+  }
+
+  public PcapRecordReader(final String inputPath,
+  final List projectedColumns) {
+try {
+  this.in = new FileInputStream(inputPath);
+  this.decoder = getPacketDecoder();
+  validBytes = in.read(buffer);
+} catch (IOException e) {
+  throw new RuntimeException("File " + inputPath + " not Found");
+}
+setColumns(projectedColumns);
+  }
+
+  @Override
+  public void setup(final OperatorContext context, final OutputMutator 
output) throws ExecutionSetupException {
+this.output = output;
+  }
+
+  @Override
+  public int next() {
+projectedCols = getProjectedColsIfItNull();
+try {
+  return parsePcapFilesAndPutItToTable();
+} catch (IOException io) {
+  throw new RuntimeException("Trouble with reading packets in file!");
+}

[jira] [Commented] (DRILL-5432) Want a memory format for PCAP files

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025659#comment-16025659
 ] 

ASF GitHub Bot commented on DRILL-5432:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/831#discussion_r118615528
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapFormatPlugin.java
 ---
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Lists;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.planner.logical.DrillTable;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.store.RecordReader;
+import org.apache.drill.exec.store.RecordWriter;
+import org.apache.drill.exec.store.dfs.BasicFormatMatcher;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.FileSelection;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.exec.store.dfs.FormatMatcher;
+import org.apache.drill.exec.store.dfs.FormatSelection;
+import org.apache.drill.exec.store.dfs.MagicString;
+import org.apache.drill.exec.store.dfs.NamedFormatPluginConfig;
+import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
+import org.apache.drill.exec.store.dfs.easy.EasyWriter;
+import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.regex.Pattern;
+
+public class PcapFormatPlugin extends EasyFormatPlugin {
+
+  private final PcapFormatMatcher matcher;
+
+  public PcapFormatPlugin(String name, DrillbitContext context, 
Configuration fsConf,
+  StoragePluginConfig storagePluginConfig) {
+this(name, context, fsConf, storagePluginConfig, new 
PcapFormatConfig());
+  }
+
+  public PcapFormatPlugin(String name, DrillbitContext context, 
Configuration fsConf, StoragePluginConfig config, PcapFormatConfig 
formatPluginConfig) {
+super(name, context, fsConf, config, formatPluginConfig, true, false, 
true, false, Lists.newArrayList("pcap"), "pcap");
+this.matcher = new PcapFormatMatcher(this);
+  }
+
+  @Override
+  public boolean supportsPushDown() {
+return true;
+  }
+
+  @Override
+  public RecordReader getRecordReader(FragmentContext context, 
DrillFileSystem dfs, FileWork fileWork, List columns, String 
userName) throws ExecutionSetupException {
+String path = dfs.makeQualified(new 
Path(fileWork.getPath())).toUri().getPath();
+return new PcapRecordReader(path, columns);
+  }
+
+  @Override
+  public RecordWriter getRecordWriter(FragmentContext context, EasyWriter 
writer) throws IOException {
+return null;
+  }
+
+  @Override
+  public int getReaderOperatorType() {
+return 0;
+  }
+
+  @Override
+  public int getWriterOperatorType() {
+return 0;
--- End diff --

Other format plugins do the following when a writer is not supported:

```
throw new UnsupportedOperationException("unimplemented");
```


> Want a memory format for PCAP files
> ---
>
> Key: DRILL-5432
> URL: https://issues.apache.org/jira/browse/DRILL-5432
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Ted Dunning
>
> PCAP files [1] are the de facto standard for storing network capture data. In 
> 

[jira] [Commented] (DRILL-5432) Want a memory format for PCAP files

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025660#comment-16025660
 ] 

ASF GitHub Bot commented on DRILL-5432:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/831#discussion_r118617811
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapRecordReader.java
 ---
@@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.pcap.decoder.Packet;
+import org.apache.drill.exec.store.pcap.decoder.PacketDecoder;
+import org.apache.drill.exec.store.pcap.dto.ColumnDto;
+import org.apache.drill.exec.store.pcap.schema.PcapTypes;
+import org.apache.drill.exec.store.pcap.schema.Schema;
+import org.apache.drill.exec.vector.NullableBigIntVector;
+import org.apache.drill.exec.vector.NullableIntVector;
+import org.apache.drill.exec.vector.NullableTimeStampVector;
+import org.apache.drill.exec.vector.NullableVarCharVector;
+import org.apache.drill.exec.vector.ValueVector;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.util.List;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+import static org.apache.drill.exec.store.pcap.Utils.parseBytesToASCII;
+
+public class PcapRecordReader extends AbstractRecordReader {
+
+  private OutputMutator output;
+
+  private final PacketDecoder decoder;
+  private ImmutableList projectedCols;
+
+  private byte[] buffer = new byte[10];
+  private int offset = 0;
+  private InputStream in;
+  private int validBytes;
+
+  private static final Map TYPES;
+
+  private static class ProjectedColumnInfo {
+ValueVector vv;
+ColumnDto pcapColumn;
+  }
+
+  static {
+TYPES = ImmutableMap.builder()
+.put(PcapTypes.STRING, MinorType.VARCHAR)
+.put(PcapTypes.INTEGER, MinorType.INT)
+.put(PcapTypes.LONG, MinorType.BIGINT)
+.put(PcapTypes.TIMESTAMP, MinorType.TIMESTAMP)
+.build();
+  }
+
+  public PcapRecordReader(final String inputPath,
+  final List projectedColumns) {
+try {
+  this.in = new FileInputStream(inputPath);
+  this.decoder = getPacketDecoder();
+  validBytes = in.read(buffer);
+} catch (IOException e) {
+  throw new RuntimeException("File " + inputPath + " not Found");
+}
+setColumns(projectedColumns);
+  }
+
+  @Override
+  public void setup(final OperatorContext context, final OutputMutator 
output) throws ExecutionSetupException {
+this.output = output;
+  }
+
+  @Override
+  public int next() {
+projectedCols = getProjectedColsIfItNull();
+try {
+  return parsePcapFilesAndPutItToTable();
+} catch (IOException io) {
+  throw new RuntimeException("Trouble with reading packets in file!");
--- End dif

[jira] [Commented] (DRILL-5432) Want a memory format for PCAP files

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025657#comment-16025657
 ] 

ASF GitHub Bot commented on DRILL-5432:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/831#discussion_r118619596
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/Utils.java ---
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Shorts;
+
+public class Utils {
+
+  public static int getIntFileOrder(boolean byteOrder, final byte[] buf, 
final int offset) {
+if (byteOrder) {
--- End diff --

Maybe an explanation of mapping byte order to booleans? true/false = 
which/which endian?


> Want a memory format for PCAP files
> ---
>
> Key: DRILL-5432
> URL: https://issues.apache.org/jira/browse/DRILL-5432
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Ted Dunning
>
> PCAP files [1] are the de facto standard for storing network capture data. In 
> security and protocol applications, it is very common to want to extract 
> particular packets from a capture for further analysis.
> At a first level, it is desirable to query and filter by source and 
> destination IP and port or by protocol. Beyond that, however, it would be 
> very useful to be able to group packets by TCP session and eventually to look 
> at packet contents. For now, however, the most critical requirement is that 
> we should be able to scan captures at very high speed.
> I previously wrote a (kind of working) proof of concept for a PCAP decoder 
> that did lazy deserialization and could traverse hundreds of MB of PCAP data 
> per second per core. This compares to roughly 2-3 MB/s for widely available 
> Apache-compatible open source PCAP decoders.
> This JIRA covers the integration and extension of that proof of concept as a 
> Drill file format.
> Initial work is available at https://github.com/mapr-demos/drill-pcap-format
> [1] https://en.wikipedia.org/wiki/Pcap



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5432) Want a memory format for PCAP files

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025668#comment-16025668
 ] 

ASF GitHub Bot commented on DRILL-5432:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/831#discussion_r118616911
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapRecordReader.java
 ---
@@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.pcap.decoder.Packet;
+import org.apache.drill.exec.store.pcap.decoder.PacketDecoder;
+import org.apache.drill.exec.store.pcap.dto.ColumnDto;
+import org.apache.drill.exec.store.pcap.schema.PcapTypes;
+import org.apache.drill.exec.store.pcap.schema.Schema;
+import org.apache.drill.exec.vector.NullableBigIntVector;
+import org.apache.drill.exec.vector.NullableIntVector;
+import org.apache.drill.exec.vector.NullableTimeStampVector;
+import org.apache.drill.exec.vector.NullableVarCharVector;
+import org.apache.drill.exec.vector.ValueVector;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.util.List;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+import static org.apache.drill.exec.store.pcap.Utils.parseBytesToASCII;
+
+public class PcapRecordReader extends AbstractRecordReader {
+
+  private OutputMutator output;
+
+  private final PacketDecoder decoder;
+  private ImmutableList projectedCols;
+
+  private byte[] buffer = new byte[10];
+  private int offset = 0;
+  private InputStream in;
+  private int validBytes;
+
+  private static final Map TYPES;
+
+  private static class ProjectedColumnInfo {
+ValueVector vv;
+ColumnDto pcapColumn;
+  }
+
+  static {
+TYPES = ImmutableMap.builder()
+.put(PcapTypes.STRING, MinorType.VARCHAR)
+.put(PcapTypes.INTEGER, MinorType.INT)
+.put(PcapTypes.LONG, MinorType.BIGINT)
+.put(PcapTypes.TIMESTAMP, MinorType.TIMESTAMP)
+.build();
+  }
+
+  public PcapRecordReader(final String inputPath,
+  final List projectedColumns) {
+try {
+  this.in = new FileInputStream(inputPath);
+  this.decoder = getPacketDecoder();
+  validBytes = in.read(buffer);
+} catch (IOException e) {
+  throw new RuntimeException("File " + inputPath + " not Found");
+}
+setColumns(projectedColumns);
+  }
+
+  @Override
+  public void setup(final OperatorContext context, final OutputMutator 
output) throws ExecutionSetupException {
+this.output = output;
+  }
+
+  @Override
+  public int next() {
+projectedCols = getProjectedColsIfItNull();
+try {
+  return parsePcapFilesAndPutItToTable();
+} catch (IOException io) {
+  throw new RuntimeException("Trouble with reading packets in file!");
+}

[jira] [Commented] (DRILL-5457) Support Spill to Disk for the Hash Aggregate Operator

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025621#comment-16025621
 ] 

ASF GitHub Bot commented on DRILL-5457:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/822#discussion_r118616162
  
--- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
@@ -205,10 +225,10 @@ drill.exec: {
 // Deprecated for managed xsort; used only by legacy xsort
 threshold: 4,
 // File system to use. Local file system by default.
-fs: "file:///"
+fs: ${drill.exec.spill.fs},
--- End diff --

Done. Added:

// -- The two options below can be used to override the options 
common
// -- for all spilling operators (see "spill" above).
// -- This is done for backward compatibility; in the future they
// -- would be deprecated (you should be using only the common ones)



> Support Spill to Disk for the Hash Aggregate Operator
> -
>
> Key: DRILL-5457
> URL: https://issues.apache.org/jira/browse/DRILL-5457
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
> Fix For: 1.11.0
>
>
> Support gradual spilling memory to disk as the available memory gets too 
> small to allow in memory work for the Hash Aggregate Operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5542) Scan unnecessary adds implicit columns to ScanRecordBatch for select * query

2017-05-25 Thread Jinfeng Ni (JIRA)
Jinfeng Ni created DRILL-5542:
-

 Summary: Scan unnecessary adds implicit columns to ScanRecordBatch 
for select * query
 Key: DRILL-5542
 URL: https://issues.apache.org/jira/browse/DRILL-5542
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Jinfeng Ni


It seems that Drill would add several implicit columns (`fqn`, `filepath`, 
`filename`, `suffix`) to ScanBatch, where it's actually not required at 
downstream operator. Although those implicit columns would be dropped off later 
on, it increases both memory and CPU overhead.

1. JSON
```
{a: 100}
```

{code}
select * from dfs.tmp.`1.json`;
+--+
|  a   |
+--+
| 100  |
+--+
{code}

The schema from ScanRecordBatch is :
{code}
[ schema:
BatchSchema [fields=[fqn(VARCHAR:OPTIONAL), filepath(VARCHAR:OPTIONAL), 
filename(VARCHAR:OPTIONAL), suffix(VARCHAR:OPTIONAL), a(BIGINT:OPTIONAL)], 
selectionVector=NONE], 
 {code}

2. Parquet
{code}
elect * from cp.`tpch/nation.parquet`;
+--+-+--+-+
| n_nationkey  | n_name  | n_regionkey  |   
   n_comment
  |
+--+-+--+-+
| 0| ALGERIA | 0|  haggle. carefully final 
deposits detect slyly agai  
   |
...
{code}

The schema of ScanRecordBatch:
{code}
  schema:
BatchSchema [fields=[n_nationkey(INT:REQUIRED), n_name(VARCHAR:REQUIRED), 
n_regionkey(INT:REQUIRED), n_comment(VARCHAR:REQUIRED), fqn(VARCHAR:OPTIONAL), 
filepath(VARCHAR:OPTIONAL), filename(VARCHAR:OPTIONAL), 
suffix(VARCHAR:OPTIONAL)], selectionVector=NONE], 
{code}

3. Text
{code}
cat 1.csv
a, b, c

select * from dfs.tmp.`1.csv`;
++
|columns |
++
| ["a","b","c"]  |
++
{code}

Schema of ScanRecordBatch 
{code}
  schema:
BatchSchema [fields=[columns(VARCHAR:REPEATED)[$data$(VARCHAR:REQUIRED)], 
fqn(VARCHAR:OPTIONAL), filepath(VARCHAR:OPTIONAL), filename(VARCHAR:OPTIONAL), 
suffix(VARCHAR:OPTIONAL)], selectionVector=NONE], 
{code}

If implicit columns are not part of query result of `select * query`, then Scan 
operator should not populate those implicit columns.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025573#comment-16025573
 ] 

ASF GitHub Bot commented on DRILL-5504:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/832
  
Fixed typo in log message and rebased onto latest master.


> Vector validator to diagnose offset vector issues
> -
>
> Key: DRILL-5504
> URL: https://issues.apache.org/jira/browse/DRILL-5504
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> DRILL-5470 describes a case in which an offset vector appears to have become 
> corrupted, yielding a bogus field-length value that is orders of magnitude 
> larger than the vector that contains the data.
> Debugging such cases is slow and tedious. To help, we propose to create a 
> "vector validator" that spins through vectors looking for problems.
> Then, to allow the validator to be used in the field, extend the "iterator 
> validator batch iterator" to optionally allow vector validation on each batch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5541) C++ Client Crashes During Simple "Man in the Middle" Attack Test with Exploitable Write AV

2017-05-25 Thread Rob Wu (JIRA)
Rob Wu created DRILL-5541:
-

 Summary: C++ Client Crashes During Simple "Man in the Middle" 
Attack Test with Exploitable Write AV
 Key: DRILL-5541
 URL: https://issues.apache.org/jira/browse/DRILL-5541
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - C++
Affects Versions: 1.10.0
Reporter: Rob Wu
Priority: Critical


drillClient!boost_sb::shared_ptr::reset+0xa7:
07fe`c292f827 f0ff4b08lock dec dword ptr [rbx+8] 
ds:07fe`c2b3de78=c29e6060

Exploitability Classification: EXPLOITABLE
Recommended Bug Title: Exploitable - User Mode Write AV starting at 
drillClient!boost_sb::shared_ptr::reset+0x00a7
 (Hash=0x4ae7fdff.0xb15af658)

User mode write access violations that are not near NULL are exploitable.

==
Stack Trace:

Child-SP  RetAddr   Call Site
`030df630 07fe`c295bca1 
drillClient!boost_sb::shared_ptr::reset+0xa7
 
[c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\smart_ptr\shared_ptr.hpp
 @ 620]
`030df680 07fe`c295433c 
drillClient!Drill::DrillClientImpl::processSchemasResult+0x281 
[c:\users\bamboo\desktop\make_win_drill\drill-1.10.0.1\drill-1.10.0.1\contrib\native\client\src\clientlib\drillclientimpl.cpp
 @ 1227]
`030df7a0 07fe`c294cbf6 
drillClient!Drill::DrillClientImpl::handleRead+0x75c 
[c:\users\bamboo\desktop\make_win_drill\drill-1.10.0.1\drill-1.10.0.1\contrib\native\client\src\clientlib\drillclientimpl.cpp
 @ 1555]
`030df9c0 07fe`c294ce9f 
drillClient!boost_sb::asio::detail::win_iocp_socket_recv_op
 
>,boost_sb::asio::mutable_buffers_1,boost_sb::asio::detail::transfer_all_t,boost_sb::_bi::bind_t,boost_sb::_bi::list4,boost_sb::_bi::value,boost_sb::arg<1>,boost_sb::arg<2> > > > >::do_complete+0x166 
[c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\detail\win_iocp_socket_recv_op.hpp
 @ 97]
`030dfa90 07fe`c296009d 
drillClient!boost_sb::asio::detail::win_iocp_io_service::do_one+0x27f 
[c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\detail\impl\win_iocp_io_service.ipp
 @ 406]
`030dfb70 07fe`c295ffc9 
drillClient!boost_sb::asio::detail::win_iocp_io_service::run+0xad 
[c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\detail\impl\win_iocp_io_service.ipp
 @ 164]
`030dfbd0 07fe`c2aa5b53 
drillClient!boost_sb::asio::io_service::run+0x29 
[c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\impl\io_service.ipp
 @ 60]
`030dfc10 07fe`c2ad3e03 drillClient!boost_sb::`anonymous 
namespace'::thread_start_function+0x43
`030dfc50 07fe`c2ad404e drillClient!_callthreadstartex+0x17 
[f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376]
`030dfc80 `779e59cd drillClient!_threadstartex+0x102 
[f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354]
`030dfcb0 `77c1a561 kernel32!BaseThreadInitThunk+0xd
`030dfce0 ` ntdll!RtlUserThreadStart+0x1d

==
Register:
rax=0284bae0 rbx=07fec2b3de70 rcx=027ec210
rdx=027ec210 rsi=027f2638 rdi=027f25d0
rip=07fec292f827 rsp=030df630 rbp=027ec210
 r8=027ec210  r9= r10=027d32fc
r11=27eb001b0003 r12= r13=028035a0
r14=027ec210 r15=
iopl=0 nv up ei pl nz na pe nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b efl=00010200
drillClient!boost_sb::shared_ptr::reset+0xa7:
07fe`c292f827 f0ff4b08lock dec dword ptr [rbx+8] 
ds:07fe`c2b3de78=c29e6060



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5457) Support Spill to Disk for the Hash Aggregate Operator

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025568#comment-16025568
 ] 

ASF GitHub Bot commented on DRILL-5457:
---

Github user rchallapalli commented on the issue:

https://github.com/apache/drill/pull/822
  
Based on the current design, if the code senses that there is not 
sufficient memory then it goes back to the old code. Now I have encountered a 
case where this happened and the old agg did not respect the memory constraints 
imposed by me. I gave 116MB memory and the old hash agg code consumed ~130MB 
and completed the query. This doesn't play well with the overall resource 
management plan


> Support Spill to Disk for the Hash Aggregate Operator
> -
>
> Key: DRILL-5457
> URL: https://issues.apache.org/jira/browse/DRILL-5457
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
> Fix For: 1.11.0
>
>
> Support gradual spilling memory to disk as the available memory gets too 
> small to allow in memory work for the Hash Aggregate Operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5356) Refactor Parquet Record Reader

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025488#comment-16025488
 ] 

ASF GitHub Bot commented on DRILL-5356:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/789
  
Cleaned up the multi-commit mess, rebased on the latest master, and fixed 
minor issues raised in code review comments. Should be read to commit.


> Refactor Parquet Record Reader
> --
>
> Key: DRILL-5356
> URL: https://issues.apache.org/jira/browse/DRILL-5356
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> The Parquet record reader class is a key part of Drill that has evolved over 
> time to become somewhat hard to follow.
> A number of us are working on Parquet-related tasks and find we have to spend 
> an uncomfortable amount of time trying to understand the code. In particular, 
> this writer needs to figure out how to convince the reader to provide 
> higher-density record batches.
> Rather than continue to decypher the complex code multiple times, this ticket 
> requests to refactor the code to make it functionally identical, but 
> structurally cleaner. The result will be faster time to value when working 
> with this code.
> This is a lower-priority change and will be coordinated with others working 
> on this code base. This ticket is only for the record reader class itself; it 
> does not include the various readers and writers that Parquet uses since 
> another project is actively modifying those classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5457) Support Spill to Disk for the Hash Aggregate Operator

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025441#comment-16025441
 ] 

ASF GitHub Bot commented on DRILL-5457:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/822#discussion_r118592786
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
 ---
@@ -133,6 +133,9 @@
  the need to turn off join optimization may go away.
*/
   public static final BooleanValidator JOIN_OPTIMIZATION = new 
BooleanValidator("planner.enable_join_optimization", true);
+  // for testing purpose
--- End diff --

@VisibleForTesting annotates methods; but this is a session option. 
Also (hidden) is the possibility that this option may be used in production 
in case some query yields a single phase hashagg but still has too much data to 
handle.


> Support Spill to Disk for the Hash Aggregate Operator
> -
>
> Key: DRILL-5457
> URL: https://issues.apache.org/jira/browse/DRILL-5457
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
> Fix For: 1.11.0
>
>
> Support gradual spilling memory to disk as the available memory gets too 
> small to allow in memory work for the Hash Aggregate Operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5356) Refactor Parquet Record Reader

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025437#comment-16025437
 ] 

ASF GitHub Bot commented on DRILL-5356:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/789#discussion_r118591297
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetSchema.java
 ---
@@ -0,0 +1,262 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.columnreaders;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.parquet.ParquetReaderUtility;
+import org.apache.drill.exec.vector.NullableIntVector;
+import org.apache.parquet.column.ColumnDescriptor;
+import org.apache.parquet.format.SchemaElement;
+import org.apache.parquet.hadoop.metadata.BlockMetaData;
+import org.apache.parquet.hadoop.metadata.ColumnChunkMetaData;
+import org.apache.parquet.hadoop.metadata.ParquetMetadata;
+
+import com.google.common.collect.Lists;
+
+/**
+ * Mapping from the schema of the Parquet file to that of the record reader
+ * to the schema that Drill and the Parquet reader uses.
+ */
+
+public class ParquetSchema {
+  /**
+   * Set of columns specified in the SELECT clause. Will be null for
+   * a SELECT * query.
+   */
+  private final Collection selectedCols;
+  /**
+   * Parallel list to the columns list above, it is used to determine the 
subset of the project
+   * pushdown columns that do not appear in this file.
+   */
+  private final boolean[] columnsFound;
+  private final OptionManager options;
+  private final int rowGroupIndex;
+  private ParquetMetadata footer;
+  /**
+   * List of metadata for selected columns. This list does two things.
+   * First, it identifies the Parquet columns we wish to select. Second, it
+   * provides metadata for those columns. Note that null columns (columns
+   * in the SELECT clause but not in the file) appear elsewhere.
+   */
+  private List selectedColumnMetadata = new 
ArrayList<>();
+  private int bitWidthAllFixedFields;
+  private boolean allFieldsFixedLength;
+  private long groupRecordCount;
+  private int recordsPerBatch;
+
+  /**
+   * Build the Parquet schema. The schema can be based on a "SELECT *",
+   * meaning we want all columns defined in the Parquet file. In this case,
+   * the list of selected columns is null. Or, the query can be based on
+   * an explicit list of selected columns. In this case, the
+   * columns need not exist in the Parquet file. If a column does not 
exist,
+   * the reader returns null for that column. If no selected column exists
+   * in the file, then we return "mock" records: records with only null
+   * values, but repeated for the number of rows in the Parquet file.
+   *
+   * @param options session options
+   * @param rowGroupIndex row group to read
+   * @param selectedCols columns specified in the SELECT clause, or null if
+   * this is a SELECT * query
+   */
+
+  public ParquetSchema(OptionManager options, int rowGroupIndex, 
Collection selectedCols) {
+this.options = options;
+this.rowGroupInd

[jira] [Commented] (DRILL-5356) Refactor Parquet Record Reader

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025432#comment-16025432
 ] 

ASF GitHub Bot commented on DRILL-5356:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/789#discussion_r118590127
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/BatchReader.java
 ---
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.columnreaders;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Lists;
+
+/**
+ * Base strategy for reading a batch of Parquet records.
+ */
+public abstract class BatchReader {
+
+  protected final ReadState readState;
+
+  public BatchReader(ReadState readState) {
+this.readState = readState;
+  }
+
+  public int readBatch() throws Exception {
+ColumnReader firstColumnStatus = readState.getFirstColumnReader();
+long recordsToRead = Math.min(getReadCount(firstColumnStatus), 
readState.getRecordsToRead());
+int readCount = readRecords(firstColumnStatus, recordsToRead);
+readState.fillNullVectors(readCount);
+return readCount;
+  }
+
+  protected abstract long getReadCount(ColumnReader firstColumnStatus);
+
+  protected abstract int readRecords(ColumnReader firstColumnStatus, 
long recordsToRead) throws Exception;
+
+  protected void readAllFixedFields(long recordsToRead) throws Exception {
+Stopwatch timer = Stopwatch.createStarted();
+if(readState.useAsyncColReader()){
+  readAllFixedFieldsParallel(recordsToRead);
+} else {
+  readAllFixedFieldsSerial(recordsToRead);
+}
+
readState.parquetReaderStats().timeFixedColumnRead.addAndGet(timer.elapsed(TimeUnit.NANOSECONDS));
+  }
+
+  protected void readAllFixedFieldsSerial(long recordsToRead) throws 
IOException {
+for (ColumnReader crs : readState.getColumnReaders()) {
+  crs.processPages(recordsToRead);
+}
+  }
+
+  protected void readAllFixedFieldsParallel(long recordsToRead) throws 
Exception {
+ArrayList> futures = Lists.newArrayList();
+for (ColumnReader crs : readState.getColumnReaders()) {
+  Future f = crs.processPagesAsync(recordsToRead);
+  futures.add(f);
+}
+Exception exception = null;
+for(Future f: futures){
+  if (exception != null) {
+f.cancel(true);
+  } else {
+try {
+  f.get();
+} catch (Exception e) {
+  f.cancel(true);
+  exception = e;
+}
+  }
+}
+if (exception != null) {
+  throw exception;
+}
+  }
+
+  /**
+   * Strategy for reading mock records. (What are these?)
+   */
--- End diff --

Fixed. Finally found out what this means. Thanks Jinfeng!


> Refactor Parquet Record Reader
> --
>
> Key: DRILL-5356
> URL: https://issues.apache.org/jira/browse/DRILL-5356
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> The Parquet record reader class is a key part of Drill that has evolved over 
> time to become somewhat hard to follow.
> A number of us are working on Parquet-related tasks and find we have to spend 
> an uncomfortable amount of time trying to understand the code. In particular, 
> this writer needs to figure out how to convince the reader to provide 
> higher-densit

[jira] [Commented] (DRILL-5356) Refactor Parquet Record Reader

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025433#comment-16025433
 ] 

ASF GitHub Bot commented on DRILL-5356:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/789#discussion_r118590427
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetColumnMetadata.java
 ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.columnreaders;
+
+import java.util.Map;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.complex.RepeatedValueVector;
+import org.apache.parquet.column.ColumnDescriptor;
+import org.apache.parquet.format.SchemaElement;
+import org.apache.parquet.hadoop.metadata.ColumnChunkMetaData;
+import org.apache.parquet.schema.PrimitiveType;
+import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName;
+
+/**
+ * Represents a single column read from the Parquet file by the record 
reader.
+ */
+
+public class ParquetColumnMetadata {
+
+  ColumnDescriptor column;
+  private SchemaElement se;
+  MaterializedField field;
+  int length;
+  private MajorType type;
+  ColumnChunkMetaData columnChunkMetaData;
+  private ValueVector vector;
+
+  public ParquetColumnMetadata(ColumnDescriptor column) {
+this.column = column;
+  }
+
+  public void resolveDrillType(Map schemaElements, 
OptionManager options) {
+se = schemaElements.get(column.getPath()[0]);
+type = ParquetToDrillTypeConverter.toMajorType(column.getType(), 
se.getType_length(),
+getDataMode(column), se, options);
+field = MaterializedField.create(toFieldName(column.getPath()), type);
+length = getDataTypeLength();
+  }
+
+  private String toFieldName(String[] paths) {
+return SchemaPath.getCompoundPath(paths).getAsUnescapedPath();
+  }
+
+  private TypeProtos.DataMode getDataMode(ColumnDescriptor column) {
+if (isRepeated()) {
+  return DataMode.REPEATED;
+} else if (column.getMaxDefinitionLevel() == 0) {
+  return TypeProtos.DataMode.REQUIRED;
+} else {
+  return TypeProtos.DataMode.OPTIONAL;
+}
+  }
+
+  /**
+   * @param type
+   * @param type a fixed length type from the parquet library enum
+   * @return the length in pageDataByteArray of the type
+   */
+  public static int getTypeLengthInBits(PrimitiveTypeName type) {
+switch (type) {
+  case INT64:   return 64;
+  case INT32:   return 32;
+  case BOOLEAN: return 1;
+  case FLOAT:   return 32;
+  case DOUBLE:  return 64;
+  case INT96:   return 96;
+  // binary and fixed length byte array
+  default:
+throw new IllegalStateException("Length cannot be determined for 
type " + type);
+}
+  }
+
+  /**
+   * Returns data type length for a given {@see ColumnDescriptor} and it's 
corresponding
+   * {@see SchemaElement}. Neither is enough information alone as the max
+   * repetition level (indicating if it is an array type) is in the 
ColumnDescriptor and
+   * the length of a fixed width field is stored at the schema

[jira] [Commented] (DRILL-5356) Refactor Parquet Record Reader

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025435#comment-16025435
 ] 

ASF GitHub Bot commented on DRILL-5356:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/789#discussion_r118591078
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetRecordReader.java
 ---
@@ -308,163 +232,50 @@ public FragmentContext getFragmentContext() {
   }
 
   /**
-   * Returns data type length for a given {@see ColumnDescriptor} and it's 
corresponding
-   * {@see SchemaElement}. Neither is enough information alone as the max
-   * repetition level (indicating if it is an array type) is in the 
ColumnDescriptor and
-   * the length of a fixed width field is stored at the schema level.
-   *
-   * @return the length if fixed width, else -1
+   * Prepare the Parquet reader. First determine the set of columns to 
read (the schema
+   * for this read.) Then, create a state object to track the read across 
calls to
+   * the reader next() method. Finally, create one of three 
readers to
+   * read batches depending on whether this scan is for only fixed-width 
fields,
+   * contains at least one variable-width field, or is a "mock" scan 
consisting
+   * only of null fields (fields in the SELECT clause but not in the 
Parquet file.)
*/
-  private int getDataTypeLength(ColumnDescriptor column, SchemaElement se) 
{
-if (column.getType() != PrimitiveType.PrimitiveTypeName.BINARY) {
-  if (column.getMaxRepetitionLevel() > 0) {
-return -1;
-  }
-  if (column.getType() == 
PrimitiveType.PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY) {
-return se.getType_length() * 8;
-  } else {
-return getTypeLengthInBits(column.getType());
-  }
-} else {
-  return -1;
-}
-  }
 
-  @SuppressWarnings({ "resource", "unchecked" })
   @Override
   public void setup(OperatorContext operatorContext, OutputMutator output) 
throws ExecutionSetupException {
 this.operatorContext = operatorContext;
-if (!isStarQuery()) {
-  columnsFound = new boolean[getColumns().size()];
-  nullFilledVectors = new ArrayList<>();
-}
-columnStatuses = new ArrayList<>();
-List columns = 
footer.getFileMetaData().getSchema().getColumns();
-allFieldsFixedLength = true;
-ColumnDescriptor column;
-ColumnChunkMetaData columnChunkMetaData;
-int columnsToScan = 0;
-mockRecordsRead = 0;
-
-MaterializedField field;
+schema = new ParquetSchema(fragmentContext.getOptions(), 
rowGroupIndex, isStarQuery() ? null : getColumns());
 
 logger.debug("Reading row group({}) with {} records in file {}.", 
rowGroupIndex, footer.getBlocks().get(rowGroupIndex).getRowCount(),
 hadoopPath.toUri().getPath());
-totalRecordsRead = 0;
-
-// TODO - figure out how to deal with this better once we add nested 
reading, note also look where this map is used below
-// store a map from column name to converted types if they are non-null
-Map schemaElements = 
ParquetReaderUtility.getColNameToSchemaElementMapping(footer);
-
-// loop to add up the length of the fixed width columns and build the 
schema
-for (int i = 0; i < columns.size(); ++i) {
-  column = columns.get(i);
-  SchemaElement se = schemaElements.get(column.getPath()[0]);
-  MajorType mt = 
ParquetToDrillTypeConverter.toMajorType(column.getType(), se.getType_length(),
-  getDataMode(column), se, fragmentContext.getOptions());
-  field = MaterializedField.create(toFieldName(column.getPath()), mt);
-  if ( ! fieldSelected(field)) {
-continue;
-  }
-  columnsToScan++;
-  int dataTypeLength = getDataTypeLength(column, se);
-  if (dataTypeLength == -1) {
-allFieldsFixedLength = false;
-  } else {
-bitWidthAllFixedFields += dataTypeLength;
-  }
-}
-
-if (columnsToScan != 0  && allFieldsFixedLength) {
-  recordsPerBatch = (int) Math.min(Math.min(batchSize / 
bitWidthAllFixedFields,
-  footer.getBlocks().get(0).getColumns().get(0).getValueCount()), 
DEFAULT_RECORDS_TO_READ_IF_FIXED_WIDTH);
-}
-else {
-  recordsPerBatch = DEFAULT_RECORDS_TO_READ_IF_VARIABLE_WIDTH;
-}
 
 try {
-  ValueVector vector;
-  SchemaElement schemaElement;
-  final ArrayList> 
varLengthColumns = new ArrayList<>();
-  // initialize all of the column read status objects
-  boolean fieldFixedLength;
- 

[jira] [Commented] (DRILL-5356) Refactor Parquet Record Reader

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025436#comment-16025436
 ] 

ASF GitHub Bot commented on DRILL-5356:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/789#discussion_r118590602
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetColumnMetadata.java
 ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.columnreaders;
+
+import java.util.Map;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.complex.RepeatedValueVector;
+import org.apache.parquet.column.ColumnDescriptor;
+import org.apache.parquet.format.SchemaElement;
+import org.apache.parquet.hadoop.metadata.ColumnChunkMetaData;
+import org.apache.parquet.schema.PrimitiveType;
+import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName;
+
+/**
+ * Represents a single column read from the Parquet file by the record 
reader.
+ */
+
+public class ParquetColumnMetadata {
+
+  ColumnDescriptor column;
+  private SchemaElement se;
+  MaterializedField field;
+  int length;
+  private MajorType type;
+  ColumnChunkMetaData columnChunkMetaData;
+  private ValueVector vector;
+
+  public ParquetColumnMetadata(ColumnDescriptor column) {
+this.column = column;
+  }
+
+  public void resolveDrillType(Map schemaElements, 
OptionManager options) {
+se = schemaElements.get(column.getPath()[0]);
+type = ParquetToDrillTypeConverter.toMajorType(column.getType(), 
se.getType_length(),
+getDataMode(column), se, options);
+field = MaterializedField.create(toFieldName(column.getPath()), type);
+length = getDataTypeLength();
+  }
+
+  private String toFieldName(String[] paths) {
+return SchemaPath.getCompoundPath(paths).getAsUnescapedPath();
+  }
+
+  private TypeProtos.DataMode getDataMode(ColumnDescriptor column) {
+if (isRepeated()) {
+  return DataMode.REPEATED;
+} else if (column.getMaxDefinitionLevel() == 0) {
+  return TypeProtos.DataMode.REQUIRED;
+} else {
+  return TypeProtos.DataMode.OPTIONAL;
+}
+  }
+
+  /**
+   * @param type
+   * @param type a fixed length type from the parquet library enum
+   * @return the length in pageDataByteArray of the type
+   */
+  public static int getTypeLengthInBits(PrimitiveTypeName type) {
+switch (type) {
+  case INT64:   return 64;
+  case INT32:   return 32;
+  case BOOLEAN: return 1;
+  case FLOAT:   return 32;
+  case DOUBLE:  return 64;
+  case INT96:   return 96;
+  // binary and fixed length byte array
+  default:
+throw new IllegalStateException("Length cannot be determined for 
type " + type);
+}
+  }
+
+  /**
+   * Returns data type length for a given {@see ColumnDescriptor} and it's 
corresponding
+   * {@see SchemaElement}. Neither is enough information alone as the max
+   * repetition level (indicating if it is an array type) is in the 
ColumnDescriptor and
+   * the length of a fixed width field is stored at the schema

[jira] [Commented] (DRILL-5356) Refactor Parquet Record Reader

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025434#comment-16025434
 ] 

ASF GitHub Bot commented on DRILL-5356:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/789#discussion_r118591443
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetInternalsTest.java
 ---
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet;
+
+import static org.junit.Assert.*;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.drill.TestBuilder;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.test.FixtureBuilder;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+public class ParquetInternalsTest extends ClusterTest {
+
+  @BeforeClass
+  public static void setup( ) throws Exception {
+FixtureBuilder builder = ClusterFixture.builder()
+  // Set options, etc.
+  ;
+startCluster(builder);
+  }
+
+  @Test
+  public void testFixedWidth() throws Exception {
+String sql = "SELECT l_orderkey, l_partkey, l_suppkey, l_linenumber, 
l_quantity\n" +
+ "FROM `cp`.`tpch/lineitem.parquet` LIMIT 20";
+//client.queryBuilder().sql(sql).printCsv();
+
+Map typeMap = new HashMap<>();
+typeMap.put(TestBuilder.parsePath("l_orderkey"), 
Types.required(TypeProtos.MinorType.INT));
+typeMap.put(TestBuilder.parsePath("l_partkey"), 
Types.required(TypeProtos.MinorType.INT));
+typeMap.put(TestBuilder.parsePath("l_suppkey"), 
Types.required(TypeProtos.MinorType.INT));
+typeMap.put(TestBuilder.parsePath("l_linenumber"), 
Types.required(TypeProtos.MinorType.INT));
+typeMap.put(TestBuilder.parsePath("l_quantity"), 
Types.required(TypeProtos.MinorType.FLOAT8));
+client.testBuilder()
+  .sqlQuery(sql)
+  .unOrdered()
+  .csvBaselineFile("parquet/expected/fixedWidth.csv")
+  .baselineColumns("l_orderkey", "l_partkey", "l_suppkey", 
"l_linenumber", "l_quantity")
+  .baselineTypes(typeMap)
+  .build()
+  .run();
+  }
+
+
--- End diff --

Fixed.


> Refactor Parquet Record Reader
> --
>
> Key: DRILL-5356
> URL: https://issues.apache.org/jira/browse/DRILL-5356
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> The Parquet record reader class is a key part of Drill that has evolved over 
> time to become somewhat hard to follow.
> A number of us are working on Parquet-related tasks and find we have to spend 
> an uncomfortable amount of time trying to understand the code. In particular, 
> this writer needs to figure out how to convince the reader to provide 
> higher-density record batches.
> Rather than continue to decypher the complex code multiple times, this ticket 
> requests to refactor the code to make it functionally identical, but 
> structurally cleaner. The result will be faster time to value when working 
> with this code.
> This is a lower-priority change and will be coordinated with others working 
> on this code base. This ticket is only for the record reader class itself; it 
> does not include the various readers and writers that Parquet uses since 
> another project is actively modifying those classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5485) Remove WebServer dependency on DrillClient

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025417#comment-16025417
 ] 

ASF GitHub Bot commented on DRILL-5485:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/829#discussion_r118587783
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebServer.java 
---
@@ -219,12 +232,43 @@ public void sessionDestroyed(HttpSessionEvent se) {
   securityHandler.logout(sessionAuth);
   session.removeAttribute(SessionAuthentication.__J_AUTHENTICATED);
 }
+
+// Clear all the custom attributes set as part of session
+clearSessionCustomAttributes(session);
   }
 });
 
 return new SessionHandler(sessionManager);
   }
 
+  private void clearSessionCustomAttributes(HttpSession session) {
--- End diff --

(I somehow managed to delete this in my comment..) The life cycle of those 
resources could be placed together in one class, as resources are being 
initialized in one place but closed in different places.


> Remove WebServer dependency on DrillClient
> --
>
> Key: DRILL-5485
> URL: https://issues.apache.org/jira/browse/DRILL-5485
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Reporter: Sorabh Hamirwasia
> Fix For: 1.11.0
>
>
> With encryption support using SASL, client's won't be able to authenticate 
> using PLAIN mechanism when encryption is enabled on the cluster. Today 
> WebServer which is embedded inside Drillbit creates a DrillClient instance 
> for each WebClient session. And the WebUser is authenticated as part of 
> authentication between DrillClient instance and Drillbit using PLAIN 
> mechanism. But with encryption enabled this will fail since encryption 
> doesn't support authentication using PLAN mechanism, hence no WebClient can 
> connect to a Drillbit. There are below issues as well with this approach:
> 1) Since DrillClient is used per WebUser session this is expensive as it has 
> heavyweight RPC layer for DrillClient and all it's dependencies. 
> 2) If the Foreman for a WebUser is also selected to be a different node then 
> there will be extra hop of transferring data back to WebClient.
> To resolve all the above issue it would be better to authenticate the WebUser 
> locally using the Drillbit on which WebServer is running without creating 
> DrillClient instance. We can use the local PAMAuthenticator to authenticate 
> the user. After authentication is successful the local Drillbit can also 
> serve as the Foreman for all the queries submitted by WebUser. This can be 
> achieved by submitting the query to the local Drillbit Foreman work queue. 
> This will also remove the requirement to encrypt the channel opened between 
> WebServer (DrillClient) and selected Drillbit since with this approach there 
> won't be any physical channel opened between them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5485) Remove WebServer dependency on DrillClient

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025383#comment-16025383
 ] 

ASF GitHub Bot commented on DRILL-5485:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/829#discussion_r118582764
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/UserClientConnection.java
 ---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.rpc;
+
+import io.netty.channel.ChannelFuture;
+import org.apache.drill.exec.physical.impl.materialize.QueryWritableBatch;
+import org.apache.drill.exec.proto.GeneralRPCProtos;
+import org.apache.drill.exec.proto.UserBitShared;
+import org.apache.drill.exec.rpc.user.UserSession;
+
+import java.net.SocketAddress;
+
+/**
+ * Interface for getting user session properties and interacting with user 
connection. Separating this interface from
+ * {@link AbstractRemoteConnection} implementation for user connection:
+ * 
+ *  Connection is passed to Foreman and Screen operators. Instead 
passing this interface exposes few details.
+ *  Makes it easy to have wrappers around user connection which can be 
helpful to tap the messages and data
+ * going to the actual client.
+ * 
+ */
+public interface UserClientConnection {
+  /**
+   * @return User session object.
+   */
+  UserSession getSession();
+
+  /**
+   * Send query result outcome to client. Outcome is returned through 
listener
+   *
+   * @param listener
+   * @param result
+   */
+  void sendResult(RpcOutcomeListener listener, 
UserBitShared.QueryResult result);
--- End diff --

Not fixed?


> Remove WebServer dependency on DrillClient
> --
>
> Key: DRILL-5485
> URL: https://issues.apache.org/jira/browse/DRILL-5485
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Reporter: Sorabh Hamirwasia
> Fix For: 1.11.0
>
>
> With encryption support using SASL, client's won't be able to authenticate 
> using PLAIN mechanism when encryption is enabled on the cluster. Today 
> WebServer which is embedded inside Drillbit creates a DrillClient instance 
> for each WebClient session. And the WebUser is authenticated as part of 
> authentication between DrillClient instance and Drillbit using PLAIN 
> mechanism. But with encryption enabled this will fail since encryption 
> doesn't support authentication using PLAN mechanism, hence no WebClient can 
> connect to a Drillbit. There are below issues as well with this approach:
> 1) Since DrillClient is used per WebUser session this is expensive as it has 
> heavyweight RPC layer for DrillClient and all it's dependencies. 
> 2) If the Foreman for a WebUser is also selected to be a different node then 
> there will be extra hop of transferring data back to WebClient.
> To resolve all the above issue it would be better to authenticate the WebUser 
> locally using the Drillbit on which WebServer is running without creating 
> DrillClient instance. We can use the local PAMAuthenticator to authenticate 
> the user. After authentication is successful the local Drillbit can also 
> serve as the Foreman for all the queries submitted by WebUser. This can be 
> achieved by submitting the query to the local Drillbit Foreman work queue. 
> This will also remove the requirement to encrypt the channel opened between 
> WebServer (DrillClient) and selected Drillbit since with this approach there 
> won't be any physical channel opened between them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5485) Remove WebServer dependency on DrillClient

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025369#comment-16025369
 ] 

ASF GitHub Bot commented on DRILL-5485:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/829#discussion_r118580820
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/AbstractUserClientConnectionWrapper.java
 ---
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.rpc;
+
+import com.google.common.base.Preconditions;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.exceptions.UserRemoteException;
+import org.apache.drill.exec.proto.GeneralRPCProtos;
+import org.apache.drill.exec.proto.UserBitShared.DrillPBError;
+import org.apache.drill.exec.proto.UserBitShared.QueryId;
+import org.apache.drill.exec.proto.UserBitShared.QueryResult;
+import org.apache.drill.exec.proto.helper.QueryIdHelper;
+
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+
+public abstract class AbstractUserClientConnectionWrapper implements 
UserClientConnection {
+  private static final org.slf4j.Logger logger =
+  
org.slf4j.LoggerFactory.getLogger(AbstractUserClientConnectionWrapper.class);
+
+  protected final CountDownLatch latch = new CountDownLatch(1);
+
+  protected volatile DrillPBError error;
+
+  protected volatile UserException exception;
+
+  /**
+   * Wait until the query has completed or timeout is passed.
+   *
+   * @throws InterruptedException
+   */
+  public boolean await(final long timeoutMillis) throws 
InterruptedException {
+return latch.await(timeoutMillis, TimeUnit.MILLISECONDS);
+  }
+
+  /**
+   * Wait indefinitely until the query is completed. Used only in case of 
WebUser
+   *
+   * @throws Exception
+   */
+  public void await() throws Exception {
+latch.await();
+if (exception != null) {
+  throw exception;
+}
+  }
+
+  @Override
+  public void sendResult(RpcOutcomeListener 
listener, QueryResult result) {
+
+Preconditions.checkState(result.hasQueryState());
+
+// Release the wait latch if the query is terminated.
+final QueryResult.QueryState state = result.getQueryState();
+final QueryId queryId = result.getQueryId();
+
+if (logger.isDebugEnabled()) {
+  logger.debug("Result arrived for QueryId: {} with QueryState: {}", 
QueryIdHelper.getQueryId(queryId), state);
+}
+
+switch (state) {
+  case FAILED:
+error = result.getError(0);
+exception = new UserRemoteException(error);
+latch.countDown();
+break;
+  case CANCELED:
+  case COMPLETED:
+Preconditions.checkState(result.getErrorCount() == 0);
+latch.countDown();
+break;
+  default:
+logger.error("Query with QueryId: {} is in unexpected state: {}", 
queryId, state);
--- End diff --

That maybe an issue as well.

AFAIK 
[DRILL-2498](https://github.com/apache/drill/commit/1d9d82b001810605e3f94ab3a5517dc0ed739715#diff-158c887d198393117d3a1bbc42114a8b)
 ensures that only the final state is sent to client using `sendResult`; this 
is the terminal message from server to client for that query. So if that 
message is wrong, the query is in an illegal state.


> Remove WebServer dependency on DrillClient
> --
>
> Key: DRILL-5485
> URL: https://issues.apache.org/jira/browse/DRILL-5485
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Reporter: Sorabh Hamirwasia
> Fix For: 1.11.0
>
>
> With encryption support using SASL, client's won't be able t

[jira] [Updated] (DRILL-5229) Upgrade kudu client to org.apache.kudu:kudu-client:1.2.0

2017-05-25 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam updated DRILL-5229:
---
Labels: ready-to-commit  (was: )

> Upgrade kudu client to org.apache.kudu:kudu-client:1.2.0 
> -
>
> Key: DRILL-5229
> URL: https://issues.apache.org/jira/browse/DRILL-5229
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.8.0
>Reporter: Rahul Raj
>Assignee: Sudheesh Katkam
>  Labels: ready-to-commit
>
> Getting an error -" out-of-order key" for a query select v,count(k) from
> kudu.test group by v where k is the primary key. This happens only when the
> aggregation is done on primary key. Should drill move to the latest kudu
> client to investigate this further?
> Current drill kudu connector uses org.kududb:kudu-client:0.6.0 from
> cloudera repository, where the latest released library
> org.apache.kudu:kudu-client:1.2.0 is hosted on maven central. There are a
> few breaking changes with the new library:
>1. TIMESTAMP renamed to UNIXTIME_MICROS
>2. In KuduRecordReader#setup -
>KuduScannerBuilder#lowerBoundPartitionKeyRaw renamed to lowerBoundRaw
>andKuduScannerBuilder#exclusiveUpperBoundPartitionKeyRaw renamed
>exclusiveUpperBoundRaw. Both methods are deprecated.
>3. In KuduRecordWriterImpl#updateSchema - client.createTable(name,
>kuduSchema) requires CreateTableOperatios as the third argument



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5229) Upgrade kudu client to org.apache.kudu:kudu-client:1.2.0

2017-05-25 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam updated DRILL-5229:
---
Fix Version/s: (was: 2.0.0)

> Upgrade kudu client to org.apache.kudu:kudu-client:1.2.0 
> -
>
> Key: DRILL-5229
> URL: https://issues.apache.org/jira/browse/DRILL-5229
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.8.0
>Reporter: Rahul Raj
>
> Getting an error -" out-of-order key" for a query select v,count(k) from
> kudu.test group by v where k is the primary key. This happens only when the
> aggregation is done on primary key. Should drill move to the latest kudu
> client to investigate this further?
> Current drill kudu connector uses org.kududb:kudu-client:0.6.0 from
> cloudera repository, where the latest released library
> org.apache.kudu:kudu-client:1.2.0 is hosted on maven central. There are a
> few breaking changes with the new library:
>1. TIMESTAMP renamed to UNIXTIME_MICROS
>2. In KuduRecordReader#setup -
>KuduScannerBuilder#lowerBoundPartitionKeyRaw renamed to lowerBoundRaw
>andKuduScannerBuilder#exclusiveUpperBoundPartitionKeyRaw renamed
>exclusiveUpperBoundRaw. Both methods are deprecated.
>3. In KuduRecordWriterImpl#updateSchema - client.createTable(name,
>kuduSchema) requires CreateTableOperatios as the third argument



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-5229) Upgrade kudu client to org.apache.kudu:kudu-client:1.2.0

2017-05-25 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam reassigned DRILL-5229:
--

Assignee: Sudheesh Katkam

> Upgrade kudu client to org.apache.kudu:kudu-client:1.2.0 
> -
>
> Key: DRILL-5229
> URL: https://issues.apache.org/jira/browse/DRILL-5229
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.8.0
>Reporter: Rahul Raj
>Assignee: Sudheesh Katkam
>
> Getting an error -" out-of-order key" for a query select v,count(k) from
> kudu.test group by v where k is the primary key. This happens only when the
> aggregation is done on primary key. Should drill move to the latest kudu
> client to investigate this further?
> Current drill kudu connector uses org.kududb:kudu-client:0.6.0 from
> cloudera repository, where the latest released library
> org.apache.kudu:kudu-client:1.2.0 is hosted on maven central. There are a
> few breaking changes with the new library:
>1. TIMESTAMP renamed to UNIXTIME_MICROS
>2. In KuduRecordReader#setup -
>KuduScannerBuilder#lowerBoundPartitionKeyRaw renamed to lowerBoundRaw
>andKuduScannerBuilder#exclusiveUpperBoundPartitionKeyRaw renamed
>exclusiveUpperBoundRaw. Both methods are deprecated.
>3. In KuduRecordWriterImpl#updateSchema - client.createTable(name,
>kuduSchema) requires CreateTableOperatios as the third argument



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5229) Upgrade kudu client to org.apache.kudu:kudu-client:1.2.0

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025336#comment-16025336
 ] 

ASF GitHub Bot commented on DRILL-5229:
---

Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/828
  
+1

The error seems unrelated to the changes, and all tests pass. Thank you for 
the PR!


> Upgrade kudu client to org.apache.kudu:kudu-client:1.2.0 
> -
>
> Key: DRILL-5229
> URL: https://issues.apache.org/jira/browse/DRILL-5229
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.8.0
>Reporter: Rahul Raj
> Fix For: 2.0.0
>
>
> Getting an error -" out-of-order key" for a query select v,count(k) from
> kudu.test group by v where k is the primary key. This happens only when the
> aggregation is done on primary key. Should drill move to the latest kudu
> client to investigate this further?
> Current drill kudu connector uses org.kududb:kudu-client:0.6.0 from
> cloudera repository, where the latest released library
> org.apache.kudu:kudu-client:1.2.0 is hosted on maven central. There are a
> few breaking changes with the new library:
>1. TIMESTAMP renamed to UNIXTIME_MICROS
>2. In KuduRecordReader#setup -
>KuduScannerBuilder#lowerBoundPartitionKeyRaw renamed to lowerBoundRaw
>andKuduScannerBuilder#exclusiveUpperBoundPartitionKeyRaw renamed
>exclusiveUpperBoundRaw. Both methods are deprecated.
>3. In KuduRecordWriterImpl#updateSchema - client.createTable(name,
>kuduSchema) requires CreateTableOperatios as the third argument



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4984) Limit 0 raises NullPointerException on JDBC storage sources

2017-05-25 Thread Holger Kiel (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025177#comment-16025177
 ] 

Holger Kiel commented on DRILL-4984:


Also unable to use Drill as jdbc source in Spark/Scala because of this bug.

> Limit 0 raises NullPointerException on JDBC storage sources
> ---
>
> Key: DRILL-4984
> URL: https://issues.apache.org/jira/browse/DRILL-4984
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0, 1.9.0, 1.10.0
> Environment: Latest 1.9 release also 1.8 release version,
> mysql-connector-java-5.1.30, mysql-connector-java-5.1.40
>Reporter: Holger Kiel
>
> NullPointerExceptions occur when a query with 'limit 0' is executed on a jdbc 
> storage source (e.g. Mysql):
> {code}
> 0: jdbc:drill:zk=local> select * from mysql.sugarcrm.sales_person limit 0;
> Error: SYSTEM ERROR: NullPointerException
> [Error Id: 6cd676fc-6db9-40b3-81d5-c2db044aeb77 on localhost:31010]
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: null
> org.apache.drill.exec.work.foreman.Foreman.run():281
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
>   Caused By (java.lang.NullPointerException) null
> 
> org.apache.drill.exec.planner.sql.handlers.FindHardDistributionScans.visit():55
> org.apache.calcite.rel.core.TableScan.accept():166
> org.apache.calcite.rel.RelShuttleImpl.visitChild():53
> org.apache.calcite.rel.RelShuttleImpl.visitChildren():68
> org.apache.calcite.rel.RelShuttleImpl.visit():126
> org.apache.calcite.rel.AbstractRelNode.accept():256
> org.apache.calcite.rel.RelShuttleImpl.visitChild():53
> org.apache.calcite.rel.RelShuttleImpl.visitChildren():68
> org.apache.calcite.rel.RelShuttleImpl.visit():126
> org.apache.calcite.rel.AbstractRelNode.accept():256
> 
> org.apache.drill.exec.planner.sql.handlers.FindHardDistributionScans.canForceSingleMode():45
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel():262
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel():290
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():168
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan():123
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():97
> org.apache.drill.exec.work.foreman.Foreman.runSQL():1008
> org.apache.drill.exec.work.foreman.Foreman.run():264
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)
> 0: jdbc:drill:zk=local> select * from mysql.sugarcrm.sales_person limit 1;
> +-+-+++-+
> | id  | first_name  |   last_name| full_name  | manager_id  |
> +-+-+++-+
> | 1   | null| Administrator  | admin  | 0   |
> +-+-+++-+
> 1 row selected (0,235 seconds)
> {code}
> Other datasources are okay:
> {code}
> 0: jdbc:drill:zk=local> SELECT * FROM cp.`employee.json` LIMIT 0;
> +--+---+---+-+--++-++--+-+---++-++-++--+-+-+--+
> | fqn  | filename  | filepath  | suffix  | employee_id  | full_name  | 
> first_name  | last_name  | position_id  | position_title  | store_id  | 
> department_id  | birth_date  | hire_date  | salary  | supervisor_id  | 
> education_level  | marital_status  | gender  | management_role  |
> +--+---+---+-+--++-++--+-+---++-++-++--+-+-+--+
> +--+---+---+-+--++-++--+-+---++-++-++--+-+-+--+
> No rows selected (0,309 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5457) Support Spill to Disk for the Hash Aggregate Operator

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025124#comment-16025124
 ] 

ASF GitHub Bot commented on DRILL-5457:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/822#discussion_r118545693
  
--- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
@@ -179,6 +179,26 @@ drill.exec: {
 // Use plain Java compilation where available
 prefer_plain_java: false
   },
+  spill: {
--- End diff --

Added "spill" and "hashagg" sections in the override example file, with 
some comments:

  spill: {
 # These options are common to all spilling operators.
 # They can be overriden, per operator (but this is just for
 # backward compatibility, and may be deprecated in the future)
 directories : [ "/tmp/drill/spill" ],
 fs : "file:///"
  }
  hashagg: {
# The partitions divide the work inside the hashagg, to ease
# handling spilling. This initial figure is tuned down when
# memory is limited.
#  Setting this option to 1 disables spilling !
num_partitions: 32,
spill: {
# The 2 options below override the common ones
# they should be deprecated in the future
directories : [ "/tmp/drill/spill" ],
fs : "file:///"
}
  },



> Support Spill to Disk for the Hash Aggregate Operator
> -
>
> Key: DRILL-5457
> URL: https://issues.apache.org/jira/browse/DRILL-5457
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
> Fix For: 1.11.0
>
>
> Support gradual spilling memory to disk as the available memory gets too 
> small to allow in memory work for the Hash Aggregate Operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5533) Fix flag assignment in FunctionInitializer.checkInit() method

2017-05-25 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam updated DRILL-5533:
---
Labels: ready-to-commit  (was: )

> Fix flag assignment in FunctionInitializer.checkInit() method
> -
>
> Key: DRILL-5533
> URL: https://issues.apache.org/jira/browse/DRILL-5533
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: ready-to-commit
>
> FunctionInitializer.checkInit() method uses DCL to ensure that function body 
> is loaded only once. But flag parameter is never updated and all threads are 
> entering synchronized block.
> Also FunctionInitializer.getImports() always returns empty list.
> https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionInitializer.java
> Changes:
> 1. Fix DCL in FunctionInitializer.checkInit() method (update flag parameter  
> when function body is loaded).
> 2. Fix ImportGrabber.getImports() method to return list with imports.
> 3. Add unit tests for FunctionInitializer.
> 4. Minor refactoring (rename methods, add javadoc).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5533) Fix flag assignment in FunctionInitializer.checkInit() method

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025064#comment-16025064
 ] 

ASF GitHub Bot commented on DRILL-5533:
---

Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/843
  
+1


> Fix flag assignment in FunctionInitializer.checkInit() method
> -
>
> Key: DRILL-5533
> URL: https://issues.apache.org/jira/browse/DRILL-5533
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: ready-to-commit
>
> FunctionInitializer.checkInit() method uses DCL to ensure that function body 
> is loaded only once. But flag parameter is never updated and all threads are 
> entering synchronized block.
> Also FunctionInitializer.getImports() always returns empty list.
> https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionInitializer.java
> Changes:
> 1. Fix DCL in FunctionInitializer.checkInit() method (update flag parameter  
> when function body is loaded).
> 2. Fix ImportGrabber.getImports() method to return list with imports.
> 3. Add unit tests for FunctionInitializer.
> 4. Minor refactoring (rename methods, add javadoc).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5356) Refactor Parquet Record Reader

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025045#comment-16025045
 ] 

ASF GitHub Bot commented on DRILL-5356:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/789
  
Thanks. I'll clean up the messy commits today. Not sure how it picked up 
the other six commits...


> Refactor Parquet Record Reader
> --
>
> Key: DRILL-5356
> URL: https://issues.apache.org/jira/browse/DRILL-5356
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> The Parquet record reader class is a key part of Drill that has evolved over 
> time to become somewhat hard to follow.
> A number of us are working on Parquet-related tasks and find we have to spend 
> an uncomfortable amount of time trying to understand the code. In particular, 
> this writer needs to figure out how to convince the reader to provide 
> higher-density record batches.
> Rather than continue to decypher the complex code multiple times, this ticket 
> requests to refactor the code to make it functionally identical, but 
> structurally cleaner. The result will be faster time to value when working 
> with this code.
> This is a lower-priority change and will be coordinated with others working 
> on this code base. This ticket is only for the record reader class itself; it 
> does not include the various readers and writers that Parquet uses since 
> another project is actively modifying those classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4824) Add not-provided and null states for map and list fields in JSON

2017-05-25 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025040#comment-16025040
 ] 

Paul Rogers commented on DRILL-4824:


The trick, of course, to adding the new null states is that the existing "bit" 
vector is used by all operators in code generation, and by Drill clients such 
as ODBC and JDBC drivers. Further, Apache Arrow is a fork of Drill, so 
improving our null support will drive the two projects further apart. Planning 
for all this stuff is required before we start writing code.

For example, if we know that a client is a version before this fix, we can 
translate the new null vector into the "legacy" bit vector. But, Drill does not 
have a versioned client API, so we have no way to know the version of the 
client. So, we have to tackle that problem as well.

In short, this is an important, but non-trivial, project.

> Add not-provided and null states for map and list fields in JSON
> 
>
> Key: DRILL-4824
> URL: https://issues.apache.org/jira/browse/DRILL-4824
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.0.0
>Reporter: Roman
>Assignee: Volodymyr Vysotskyi
>
> There is incorrect output in case of JSON file with complex nested data.
> _JSON:_
> {code:none|title=example.json|borderStyle=solid}
> {
> "Field1" : {
> }
> }
> {
> "Field1" : {
> "InnerField1": {"key1":"value1"},
> "InnerField2": {"key2":"value2"}
> }
> }
> {
> "Field1" : {
> "InnerField3" : ["value3", "value4"],
> "InnerField4" : ["value5", "value6"]
> }
> }
> {code}
> _Query:_
> {code:sql}
> select Field1 from dfs.`/tmp/example.json`
> {code}
> _Incorrect result:_
> {code:none}
> +---+
> |  Field1   |
> +---+
> {"InnerField1":{},"InnerField2":{},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{"key1":"value1"},"InnerField2" 
> {"key2":"value2"},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{},"InnerField2":{},"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--+
> {code}
> Theres is no need to output missing fields. In case of deeply nested 
> structure we will get unreadable result for user.
> _Correct result:_
> {code:none}
> +--+
> | Field1   |
> +--+
> |{} 
> {"InnerField1":{"key1":"value1"},"InnerField2":{"key2":"value2"}}
> {"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5356) Refactor Parquet Record Reader

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024996#comment-16024996
 ] 

ASF GitHub Bot commented on DRILL-5356:
---

Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/789
  
I took the entire patch and applied it to master (use git am -3). Git 
manages to figure out that the commits are already applied. One commit caused a 
merge conflict and I skipped it. In the end it left me with only the one 
commit. 


> Refactor Parquet Record Reader
> --
>
> Key: DRILL-5356
> URL: https://issues.apache.org/jira/browse/DRILL-5356
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> The Parquet record reader class is a key part of Drill that has evolved over 
> time to become somewhat hard to follow.
> A number of us are working on Parquet-related tasks and find we have to spend 
> an uncomfortable amount of time trying to understand the code. In particular, 
> this writer needs to figure out how to convince the reader to provide 
> higher-density record batches.
> Rather than continue to decypher the complex code multiple times, this ticket 
> requests to refactor the code to make it functionally identical, but 
> structurally cleaner. The result will be faster time to value when working 
> with this code.
> This is a lower-priority change and will be coordinated with others working 
> on this code base. This ticket is only for the record reader class itself; it 
> does not include the various readers and writers that Parquet uses since 
> another project is actively modifying those classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4824) Add not-provided and null states for map and list fields in JSON

2017-05-25 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024994#comment-16024994
 ] 

Paul Rogers commented on DRILL-4824:


Turns there is a flaw in the value vector code that does not "back fill" 
missing offset vector values for repeated types. The logic works fine for 
Varchar columns, but not repeated columns.

The repeated type problem will be fixed as part of the memory fragmentation 
work in which we are creating a new version of the "writers" used to move data 
into value vectors. Please don't spend time fixing this part of the current 
code as that existing code will be retired.

> Add not-provided and null states for map and list fields in JSON
> 
>
> Key: DRILL-4824
> URL: https://issues.apache.org/jira/browse/DRILL-4824
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.0.0
>Reporter: Roman
>Assignee: Volodymyr Vysotskyi
>
> There is incorrect output in case of JSON file with complex nested data.
> _JSON:_
> {code:none|title=example.json|borderStyle=solid}
> {
> "Field1" : {
> }
> }
> {
> "Field1" : {
> "InnerField1": {"key1":"value1"},
> "InnerField2": {"key2":"value2"}
> }
> }
> {
> "Field1" : {
> "InnerField3" : ["value3", "value4"],
> "InnerField4" : ["value5", "value6"]
> }
> }
> {code}
> _Query:_
> {code:sql}
> select Field1 from dfs.`/tmp/example.json`
> {code}
> _Incorrect result:_
> {code:none}
> +---+
> |  Field1   |
> +---+
> {"InnerField1":{},"InnerField2":{},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{"key1":"value1"},"InnerField2" 
> {"key2":"value2"},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{},"InnerField2":{},"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--+
> {code}
> Theres is no need to output missing fields. In case of deeply nested 
> structure we will get unreadable result for user.
> _Correct result:_
> {code:none}
> +--+
> | Field1   |
> +--+
> |{} 
> {"InnerField1":{"key1":"value1"},"InnerField2":{"key2":"value2"}}
> {"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-5379) Set Hdfs Block Size based on Parquet Block Size

2017-05-25 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam reassigned DRILL-5379:
--

Assignee: Sudheesh Katkam

> Set Hdfs Block Size based on Parquet Block Size
> ---
>
> Key: DRILL-5379
> URL: https://issues.apache.org/jira/browse/DRILL-5379
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: F Méthot
>Assignee: Sudheesh Katkam
>  Labels: ready-to-commit
> Fix For: Future
>
>
> It seems there a way to force Drill to store CTAS generated parquet file as a 
> single block when using HDFS. Java HDFS API allows to do that, files could be 
> created with the Parquet block-size set in a session or system config.
> Since it is ideal  to have single parquet file per hdfs block.
> Here is the HDFS API that allow to do that:
> http://archive.cloudera.com/cdh4/cdh/4/hadoop/api/org/apache/hadoop/fs/FileSystem.html#create(org.apache.hadoop.fs.Path,%20boolean,%20int,%20short,%20long)
> http://archive.cloudera.com/cdh4/cdh/4/hadoop/api/org/apache/hadoop/fs/FileSystem.html#create(org.apache.hadoop.fs.Path,%20boolean,%20int,%20short,%20long)
> Drill uses the hadoop ParquetFileWriter 
> (https://github.com/Parquet/parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/ParquetFileWriter.java).
> This is where the file creation occurs so it might be tricky.
> However, ParquetRecordWriter.java 
> (https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java)
>  in Drill creates the ParquetFileWriter with an hadoop configuration object.
> something to explore: Could the block size be set as a property within the 
> Configuration object before passing it to ParquetFileWriter constructor?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-5379) Set Hdfs Block Size based on Parquet Block Size

2017-05-25 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam reassigned DRILL-5379:
--

Assignee: Padma Penumarthy  (was: Sudheesh Katkam)

> Set Hdfs Block Size based on Parquet Block Size
> ---
>
> Key: DRILL-5379
> URL: https://issues.apache.org/jira/browse/DRILL-5379
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: F Méthot
>Assignee: Padma Penumarthy
>  Labels: ready-to-commit
> Fix For: Future
>
>
> It seems there a way to force Drill to store CTAS generated parquet file as a 
> single block when using HDFS. Java HDFS API allows to do that, files could be 
> created with the Parquet block-size set in a session or system config.
> Since it is ideal  to have single parquet file per hdfs block.
> Here is the HDFS API that allow to do that:
> http://archive.cloudera.com/cdh4/cdh/4/hadoop/api/org/apache/hadoop/fs/FileSystem.html#create(org.apache.hadoop.fs.Path,%20boolean,%20int,%20short,%20long)
> http://archive.cloudera.com/cdh4/cdh/4/hadoop/api/org/apache/hadoop/fs/FileSystem.html#create(org.apache.hadoop.fs.Path,%20boolean,%20int,%20short,%20long)
> Drill uses the hadoop ParquetFileWriter 
> (https://github.com/Parquet/parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/ParquetFileWriter.java).
> This is where the file creation occurs so it might be tricky.
> However, ParquetRecordWriter.java 
> (https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java)
>  in Drill creates the ParquetFileWriter with an hadoop configuration object.
> something to explore: Could the block size be set as a property within the 
> Configuration object before passing it to ParquetFileWriter constructor?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5356) Refactor Parquet Record Reader

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024983#comment-16024983
 ] 

ASF GitHub Bot commented on DRILL-5356:
---

Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/789
  
Are the changes only in 1494915dbef5dbd5996c19d0a2e89ca450a8ae3a (to cherry 
pick)?


> Refactor Parquet Record Reader
> --
>
> Key: DRILL-5356
> URL: https://issues.apache.org/jira/browse/DRILL-5356
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> The Parquet record reader class is a key part of Drill that has evolved over 
> time to become somewhat hard to follow.
> A number of us are working on Parquet-related tasks and find we have to spend 
> an uncomfortable amount of time trying to understand the code. In particular, 
> this writer needs to figure out how to convince the reader to provide 
> higher-density record batches.
> Rather than continue to decypher the complex code multiple times, this ticket 
> requests to refactor the code to make it functionally identical, but 
> structurally cleaner. The result will be faster time to value when working 
> with this code.
> This is a lower-priority change and will be coordinated with others working 
> on this code base. This ticket is only for the record reader class itself; it 
> does not include the various readers and writers that Parquet uses since 
> another project is actively modifying those classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5356) Refactor Parquet Record Reader

2017-05-25 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam updated DRILL-5356:
---
Labels: ready-to-commit  (was: )

> Refactor Parquet Record Reader
> --
>
> Key: DRILL-5356
> URL: https://issues.apache.org/jira/browse/DRILL-5356
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> The Parquet record reader class is a key part of Drill that has evolved over 
> time to become somewhat hard to follow.
> A number of us are working on Parquet-related tasks and find we have to spend 
> an uncomfortable amount of time trying to understand the code. In particular, 
> this writer needs to figure out how to convince the reader to provide 
> higher-density record batches.
> Rather than continue to decypher the complex code multiple times, this ticket 
> requests to refactor the code to make it functionally identical, but 
> structurally cleaner. The result will be faster time to value when working 
> with this code.
> This is a lower-priority change and will be coordinated with others working 
> on this code base. This ticket is only for the record reader class itself; it 
> does not include the various readers and writers that Parquet uses since 
> another project is actively modifying those classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4824) Add not-provided and null states for map and list fields in JSON

2017-05-25 Thread Volodymyr Vysotskyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-4824:
---
Summary: Add not-provided and null states for map and list fields in JSON  
(was: JSON with complex nested data produces incorrect output with missing 
fields)

> Add not-provided and null states for map and list fields in JSON
> 
>
> Key: DRILL-4824
> URL: https://issues.apache.org/jira/browse/DRILL-4824
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.0.0
>Reporter: Roman
>Assignee: Volodymyr Vysotskyi
>
> There is incorrect output in case of JSON file with complex nested data.
> _JSON:_
> {code:none|title=example.json|borderStyle=solid}
> {
> "Field1" : {
> }
> }
> {
> "Field1" : {
> "InnerField1": {"key1":"value1"},
> "InnerField2": {"key2":"value2"}
> }
> }
> {
> "Field1" : {
> "InnerField3" : ["value3", "value4"],
> "InnerField4" : ["value5", "value6"]
> }
> }
> {code}
> _Query:_
> {code:sql}
> select Field1 from dfs.`/tmp/example.json`
> {code}
> _Incorrect result:_
> {code:none}
> +---+
> |  Field1   |
> +---+
> {"InnerField1":{},"InnerField2":{},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{"key1":"value1"},"InnerField2" 
> {"key2":"value2"},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{},"InnerField2":{},"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--+
> {code}
> Theres is no need to output missing fields. In case of deeply nested 
> structure we will get unreadable result for user.
> _Correct result:_
> {code:none}
> +--+
> | Field1   |
> +--+
> |{} 
> {"InnerField1":{"key1":"value1"},"InnerField2":{"key2":"value2"}}
> {"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5539) drillbit.sh script breaks if the working directory contains spaces

2017-05-25 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024975#comment-16024975
 ] 

Paul Rogers commented on DRILL-5539:


On the surface, this looks pretty easy: just put quotes where needed. As it 
turns out, {{drillbit.sh}} calls {{drill-config.sh}} to do all the heavy 
lifting, and that drill-config.sh is called by many of our scripts. And, it 
does lots of path work to find the config files, find directories, find Java 
and so on.

The scripts do presently assume no spaces in directory names. Spaceless names 
is the general rule on Linux, but obviously Windows often uses spaces, most 
notably in the {{C:\Program Files}} directory.

Further, we have a unit test (not yet checked in) for the scripts that should 
be modified to test for the case you found. See DRILL-5540 for a request to 
check the shell script unit tests into Apache Drill master.

> drillbit.sh script breaks if the working directory contains spaces
> --
>
> Key: DRILL-5539
> URL: https://issues.apache.org/jira/browse/DRILL-5539
> Project: Apache Drill
>  Issue Type: Bug
> Environment: Linux
>Reporter: Muhammad Gelbana
>
> The following output occurred when we tried running the drillbit.sh script in 
> a path that contains spaces: */home/folder1/Folder Name/drill/bin*
> {noformat}
> [mgelbana@regression-sysops bin]$ ./drillbit.sh start
> ./drillbit.sh: line 114: [: /home/folder1/Folder: binary operator expected
> Starting drillbit, logging to /home/folder1/Folder Name/drill/log/drillbit.out
> ./drillbit.sh: line 147: $pid: ambiguous redirect
> [mgelbana@regression-sysops bin]$ pwd
> /home/folder1/Folder Name/drill/bin
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5540) Provide unit tests for the Drill shell scripts

2017-05-25 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5540:
--

 Summary: Provide unit tests for the Drill shell scripts
 Key: DRILL-5540
 URL: https://issues.apache.org/jira/browse/DRILL-5540
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.8.0
Reporter: Paul Rogers
Assignee: Paul Rogers
Priority: Minor
 Fix For: 1.11.0


The Drill-on-YARN project created a unit test that exercises the Drill shell 
scripts to ensure that they work as expected. (It is very hard to debug the 
scripts when launched under YARN, so we had to fully test them stand-alone to 
ensure that they work properly under YARN.)

This ticket asks to commit those scripts to Drill separate from the large DoY 
commit as the YARN dependencies can be easily removed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (DRILL-5467) Issue with column alias for nested table calculated columns

2017-05-25 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva closed DRILL-5467.
---
Resolution: Duplicate

Duplicates DRILL-5537.

> Issue with column alias for nested table calculated columns
> ---
>
> Key: DRILL-5467
> URL: https://issues.apache.org/jira/browse/DRILL-5467
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Rakesh
>Assignee: Vitalii Diravka
>
> The column alias is not always correctly used in output. When columns as 
> calculated in nested table, the outer most project doesn't show the column 
> alias correctly:
> SELECT `Custom_SQL_Query`.`Bucket` AS `Bucket`,
>   SUM(`Custom_SQL_Query`.`male`) AS `sum_male`
> FROM (SELECT first_name as `Bucket`, salary as `num`, case when gender = 'M' 
> then 1 else 0 end as male, case when gender = 'F' then 1 else 0 end as female 
> FROM cp.`employee.json`) `Custom_SQL_Query`
> GROUP BY `Custom_SQL_Query`.`Bucket`
> Here 'sum_male' appears as $f1 instead



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5537) Display columns alias for queries with sum() when RDBMS storage plugin is enabled

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024768#comment-16024768
 ] 

ASF GitHub Bot commented on DRILL-5537:
---

GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/845

DRILL-5537: Display columns alias for queries with sum() when RDBMS s…

…torage plugin is enabled

For sum() queries DrillConvertSumToSumZero rule is applied. But during 
converting to new aggregate call, this call was created with name set to null, 
therefore column alias was lost when RDBMS storage plugin was enabled. RDBMS 
storage plugin was adding new rule during PHYSICAL phase - ReduceProjectRule, 
since project stage was omitted, column alias was lost. With this fix even if 
project stage is omitted, column alias still will be shown.

Changes:
1. Added old call aggregate name during new call aggregate creation in 
DrillConvertSumToSumZero  rule.
2. Replaced deprecated AggregateCall constructor to `AggregateCall.create`.
3. Minor refactoring.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-5537

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/845.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #845


commit 5e83d6d17232d4ddbff7e11eaadecad9ef992b10
Author: Arina Ielchiieva 
Date:   2017-05-25T13:23:43Z

DRILL-5537: Display columns alias for queries with sum() when RDBMS storage 
plugin is enabled




> Display columns alias for queries with sum() when RDBMS storage plugin is 
> enabled
> -
>
> Key: DRILL-5537
> URL: https://issues.apache.org/jira/browse/DRILL-5537
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>
> When [RDBMS storage 
> plugin|https://drill.apache.org/docs/rdbms-storage-plugin/]  is enabled, 
> alias is not displayed for column with sum function:
> {noformat}
> 0: jdbc:drill:zk=local> select version, sum(1) as s from sys.version group by 
> version;
> +--+--+
> | version  | $f1  |
> +--+--+
> | 1.11.0-SNAPSHOT  | 1|
> +--+--+
> 1 row selected (0.444 seconds)
> {noformat}
> Other functions like avg, count are not affected.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5538) Exclude ProjectRemoveRule during PHYSICAL phase if it comes from storage plugins

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024727#comment-16024727
 ] 

ASF GitHub Bot commented on DRILL-5538:
---

GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/844

DRILL-5538: Exclude ProjectRemoveRule during PHYSICAL phase if it com…

…es from storage plugins

Details in DRILL-5538 description.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-5538

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/844.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #844


commit 874edfc86d4f69ecb917bd158b6afc1282ff34e7
Author: Arina Ielchiieva 
Date:   2017-05-25T11:34:31Z

DRILL-5538: Exclude ProjectRemoveRule during PHYSICAL phase if it comes 
from storage plugins




> Exclude ProjectRemoveRule during PHYSICAL phase if it comes from storage 
> plugins
> 
>
> Key: DRILL-5538
> URL: https://issues.apache.org/jira/browse/DRILL-5538
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>
> When [RDBMS storage 
> plugin|https://drill.apache.org/docs/rdbms-storage-plugin/]  is enabled, 
> during query execution certain JDBC rules are added.
> One of the rules is 
> [ProjectRemoveRule|https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcStoragePlugin.java#L140].
>  Drill also uses this rule but during phases when it considers it useful, for 
> example, during LOGICAL and JOIN_PLANNING. On the contrary, storage plugin 
> rules are added to any phase of query planning. Thus it results to project 
> stage to be removed when actually it is needed.
> Sometimes when ProjectRemoveRule decides that project is trivial and removes 
> it, during this stage Drill added column alias or removed implicit columns.
> For example, with RDBMS plugin enabled, alias is not displayed for simple 
> query:
> {noformat}
> 0: jdbc:drill:zk=local> create temporary table t as select * from sys.version;
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> 1 row selected (0.623 seconds)
> 0: jdbc:drill:zk=local> select version as current_version from t;
> +--+
> | version  |
> +--+
> | 1.11.0-SNAPSHOT  |
> +--+
> 1 row selected (0.28 seconds)
> {noformat}
> Proposed fix is to exclude ProjectRemoveRule during PHYSICAL phase if it 
> comes from storage plugins to prevent Drill losing column alias or displaying 
> implicit columns.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5539) drillbit.sh script breaks if the working directory contains spaces

2017-05-25 Thread Muhammad Gelbana (JIRA)
Muhammad Gelbana created DRILL-5539:
---

 Summary: drillbit.sh script breaks if the working directory 
contains spaces
 Key: DRILL-5539
 URL: https://issues.apache.org/jira/browse/DRILL-5539
 Project: Apache Drill
  Issue Type: Bug
 Environment: Linux
Reporter: Muhammad Gelbana


The following output occurred when we tried running the drillbit.sh script in a 
path that contains spaces: */home/folder1/Folder Name/drill/bin*

{noformat}
[mgelbana@regression-sysops bin]$ ./drillbit.sh start
./drillbit.sh: line 114: [: /home/folder1/Folder: binary operator expected
Starting drillbit, logging to /home/folder1/Folder Name/drill/log/drillbit.out
./drillbit.sh: line 147: $pid: ambiguous redirect
[mgelbana@regression-sysops bin]$ pwd
/home/folder1/Folder Name/drill/bin
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5538) Exclude ProjectRemoveRule during PHYSICAL phase if it comes from storage plugins

2017-05-25 Thread Arina Ielchiieva (JIRA)
Arina Ielchiieva created DRILL-5538:
---

 Summary: Exclude ProjectRemoveRule during PHYSICAL phase if it 
comes from storage plugins
 Key: DRILL-5538
 URL: https://issues.apache.org/jira/browse/DRILL-5538
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.10.0
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva


When [RDBMS storage plugin|https://drill.apache.org/docs/rdbms-storage-plugin/] 
 is enabled, during query execution certain JDBC rules are added.
One of the rules is 
[ProjectRemoveRule|https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcStoragePlugin.java#L140].
 Drill also uses this rule but during phases when it considers it useful, for 
example, during LOGICAL and JOIN_PLANNING. On the contrary, storage plugin 
rules are added to any phase of query planning. Thus it results to project 
stage to be removed when actually it is needed.

Sometimes when ProjectRemoveRule decides that project is trivial and removes 
it, during this stage Drill added column alias or removed implicit columns.

For example, with RDBMS plugin enabled, alias is not displayed for simple query:
{noformat}
0: jdbc:drill:zk=local> create temporary table t as select * from sys.version;
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
+---++
| Fragment  | Number of records written  |
+---++
| 0_0   | 1  |
+---++
1 row selected (0.623 seconds)
0: jdbc:drill:zk=local> select version as current_version from t;
+--+
| version  |
+--+
| 1.11.0-SNAPSHOT  |
+--+
1 row selected (0.28 seconds)
{noformat}

Proposed fix is to exclude ProjectRemoveRule during PHYSICAL phase if it comes 
from storage plugins to prevent Drill losing column alias or displaying 
implicit columns.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5537) Display columns alias for queries with sum() when RDBMS storage plugin is enabled

2017-05-25 Thread Arina Ielchiieva (JIRA)
Arina Ielchiieva created DRILL-5537:
---

 Summary: Display columns alias for queries with sum() when RDBMS 
storage plugin is enabled
 Key: DRILL-5537
 URL: https://issues.apache.org/jira/browse/DRILL-5537
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva


When [RDBMS storage plugin|https://drill.apache.org/docs/rdbms-storage-plugin/] 
 is enabled, alias is not displayed for column with sum function:
{noformat}
0: jdbc:drill:zk=local> select version, sum(1) as s from sys.version group by 
version;
+--+--+
| version  | $f1  |
+--+--+
| 1.11.0-SNAPSHOT  | 1|
+--+--+
1 row selected (0.444 seconds)
{noformat}
Other functions like avg, count are not affected.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)