[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-03-28 Thread Kunal Khatua (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946106#comment-15946106
 ] 

Kunal Khatua commented on DRILL-5258:
-

Closing as no QA verification is required

> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-03-28 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945702#comment-15945702
 ] 

Paul Rogers commented on DRILL-5258:


Development-only issue, no QA verification needed.

> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-03-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892995#comment-15892995
 ] 

ASF GitHub Bot commented on DRILL-5258:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/752


> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883847#comment-15883847
 ] 

ASF GitHub Bot commented on DRILL-5258:
---

Github user sohami commented on the issue:

https://github.com/apache/drill/pull/752
  
Thanks for the change. LGTM. +1


> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883810#comment-15883810
 ] 

ASF GitHub Bot commented on DRILL-5258:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/752#discussion_r103057839
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/BooleanGen.java 
---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mock;
+
+import java.util.Random;
+
+import org.apache.drill.exec.vector.BitVector;
+import org.apache.drill.exec.vector.ValueVector;
+
+public class BooleanGen implements FieldGen {
+
+  Random rand = new Random( );
+
+  @Override
+  public void setup(ColumnDef colDef) { }
+
+  public int value( ) {
--- End diff --

Fixed.


> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883809#comment-15883809
 ] 

ASF GitHub Bot commented on DRILL-5258:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/752#discussion_r103057873
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MockGroupScanPOP.java
 ---
@@ -75,20 +76,50 @@
*/
 
   private boolean extended;
+  private ScanStats scanStats = ScanStats.TRIVIAL_TABLE;
 
   @JsonCreator
   public MockGroupScanPOP(@JsonProperty("url") String url,
-  @JsonProperty("extended") Boolean extended,
   @JsonProperty("entries") List readEntries) {
 super((String) null);
 this.readEntries = readEntries;
 this.url = url;
-this.extended = extended == null ? false : extended;
+
+// Compute decent row-count stats for this mock data source so that
+// the planner is "fooled" into thinking that this operator wil do
+// disk I/O.
+
+int rowCount = 0;
+int rowWidth = 0;
+for (MockScanEntry entry : readEntries) {
+  rowCount += entry.getRecords();
+  int width = 0;
+  if (entry.getTypes() == null) {
+width = 50;
+  } else {
+for (MockColumn col : entry.getTypes()) {
+  int colWidth = 0;
+  if (col.getWidthValue() == 0) {
+colWidth = TypeHelper.getSize(col.getMajorType());
+  } else {
+colWidth = col.getWidthValue();
+  }
+  colWidth *= col.getRepeatCount();
+  width += colWidth;
+}
+  }
+  rowWidth = Math.max(rowWidth, width);
--- End diff --

Revised names and added comments to make clear what's going on.


> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883811#comment-15883811
 ] 

ASF GitHub Bot commented on DRILL-5258:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/752#discussion_r103058784
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/fn/interp/ExpressionInterpreterTest.java
 ---
@@ -124,7 +125,7 @@ public void interpreterDateTest() throws Exception {
 final BitControl.PlanFragment planFragment = 
BitControl.PlanFragment.getDefaultInstance();
 final QueryContextInformation queryContextInfo = 
planFragment.getContext();
 final inttimeZoneIndex = 
queryContextInfo.getTimeZone();
-final org.joda.time.DateTimeZone timeZone = 
org.joda.time.DateTimeZone.forID(org.apache.drill.exec.expr.fn.impl.DateUtility.getTimeZone(timeZoneIndex));
+final DateTimeZone timeZone =
DateTimeZone.forID(org.apache.drill.exec.expr.fn.impl.DateUtility.getTimeZone(timeZoneIndex));
--- End diff --

Original code, but fixed.


> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883812#comment-15883812
 ] 

ASF GitHub Bot commented on DRILL-5258:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/752#discussion_r103058623
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/package-info.java 
---
@@ -60,14 +62,26 @@
  * The mode is one of the supported Drill
  * {@link DataMode} names: usually OPTIONAL or 
REQUIRED.
  * 
+ * 
+ * Recent extensions include:
+ * 
+ * repeat in either the "entry" or "record" elements allow
--- End diff --

Yes. Added the property to MockScanEntry. Need to add it to the 
implementation as well, which is planned, but not yet complete.


> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883813#comment-15883813
 ] 

ASF GitHub Bot commented on DRILL-5258:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/752#discussion_r103057899
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MockStorageEngine.java
 ---
@@ -89,14 +85,30 @@ public boolean supportsRead() {
 return true;
   }
 
-//  public static class ImplicitTable extends DynamicDrillTable {
-//
-//public ImplicitTable(StoragePlugin plugin, String storageEngineName,
-//Object selection) {
-//  super(plugin, storageEngineName, selection);
-//}
-//
-//  }
+  /**
+   * Resolves table names within the mock data source. Tables can be of 
two forms:
+   * 
+   * _
+   * 
+   * Where the "name" can be anything, "n" is the number of rows, and 
"unit" is
+   * the units for the row count: non, K (thousand) or M (million).
+   * 
+   * The above form generates a table directly with no other information 
needed.
+   * Column names must be provided, and must be of the form:
+   * 
+   * _
+   * 
+   * Where the name can be anything, the type must be i (integer), d 
(double)
+   * or s (string, AKA VarChar). The length is needed only for string 
fields.
--- End diff --

Fixed.


> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877199#comment-15877199
 ] 

ASF GitHub Bot commented on DRILL-5258:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/752#discussion_r102294277
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/BooleanGen.java 
---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mock;
+
+import java.util.Random;
+
+import org.apache.drill.exec.vector.BitVector;
+import org.apache.drill.exec.vector.ValueVector;
+
+public class BooleanGen implements FieldGen {
+
+  Random rand = new Random( );
+
+  @Override
+  public void setup(ColumnDef colDef) { }
+
+  public int value( ) {
--- End diff --

Extra space between `()`


> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877195#comment-15877195
 ] 

ASF GitHub Bot commented on DRILL-5258:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/752#discussion_r102320948
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MockStorageEngine.java
 ---
@@ -89,14 +85,30 @@ public boolean supportsRead() {
 return true;
   }
 
-//  public static class ImplicitTable extends DynamicDrillTable {
-//
-//public ImplicitTable(StoragePlugin plugin, String storageEngineName,
-//Object selection) {
-//  super(plugin, storageEngineName, selection);
-//}
-//
-//  }
+  /**
+   * Resolves table names within the mock data source. Tables can be of 
two forms:
+   * 
+   * _
+   * 
+   * Where the "name" can be anything, "n" is the number of rows, and 
"unit" is
+   * the units for the row count: non, K (thousand) or M (million).
+   * 
+   * The above form generates a table directly with no other information 
needed.
+   * Column names must be provided, and must be of the form:
+   * 
+   * _
+   * 
+   * Where the name can be anything, the type must be i (integer), d 
(double)
+   * or s (string, AKA VarChar). The length is needed only for string 
fields.
--- End diff --

how about boolean (b) as a type ?


> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877198#comment-15877198
 ] 

ASF GitHub Bot commented on DRILL-5258:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/752#discussion_r102360734
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MockGroupScanPOP.java
 ---
@@ -75,20 +76,50 @@
*/
 
   private boolean extended;
+  private ScanStats scanStats = ScanStats.TRIVIAL_TABLE;
 
   @JsonCreator
   public MockGroupScanPOP(@JsonProperty("url") String url,
-  @JsonProperty("extended") Boolean extended,
   @JsonProperty("entries") List readEntries) {
 super((String) null);
 this.readEntries = readEntries;
 this.url = url;
-this.extended = extended == null ? false : extended;
+
+// Compute decent row-count stats for this mock data source so that
+// the planner is "fooled" into thinking that this operator wil do
+// disk I/O.
+
+int rowCount = 0;
+int rowWidth = 0;
+for (MockScanEntry entry : readEntries) {
+  rowCount += entry.getRecords();
+  int width = 0;
+  if (entry.getTypes() == null) {
+width = 50;
+  } else {
+for (MockColumn col : entry.getTypes()) {
+  int colWidth = 0;
+  if (col.getWidthValue() == 0) {
+colWidth = TypeHelper.getSize(col.getMajorType());
+  } else {
+colWidth = col.getWidthValue();
+  }
+  colWidth *= col.getRepeatCount();
+  width += colWidth;
+}
+  }
+  rowWidth = Math.max(rowWidth, width);
--- End diff --

`rowWidth` seems to be `maxRowWidth` and `width` is `rowWidth`. Can we 
please rename these ?


> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877196#comment-15877196
 ] 

ASF GitHub Bot commented on DRILL-5258:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/752#discussion_r102298318
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/fn/interp/ExpressionInterpreterTest.java
 ---
@@ -124,7 +125,7 @@ public void interpreterDateTest() throws Exception {
 final BitControl.PlanFragment planFragment = 
BitControl.PlanFragment.getDefaultInstance();
 final QueryContextInformation queryContextInfo = 
planFragment.getContext();
 final inttimeZoneIndex = 
queryContextInfo.getTimeZone();
-final org.joda.time.DateTimeZone timeZone = 
org.joda.time.DateTimeZone.forID(org.apache.drill.exec.expr.fn.impl.DateUtility.getTimeZone(timeZoneIndex));
+final DateTimeZone timeZone =
DateTimeZone.forID(org.apache.drill.exec.expr.fn.impl.DateUtility.getTimeZone(timeZoneIndex));
--- End diff --

Please remove extra space after `=`


> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877197#comment-15877197
 ] 

ASF GitHub Bot commented on DRILL-5258:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/752#discussion_r102336171
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/package-info.java 
---
@@ -60,14 +62,26 @@
  * The mode is one of the supported Drill
  * {@link DataMode} names: usually OPTIONAL or 
REQUIRED.
  * 
+ * 
+ * Recent extensions include:
+ * 
+ * repeat in either the "entry" or "record" elements allow
--- End diff --

I just found repeat definition in `MockColumn` but not in `MockScanEntry` 
whereas here in comment and example `example-mock.json` we are showing repeat 
property at entry level. Is this work in progress ?


> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877194#comment-15877194
 ] 

ASF GitHub Bot commented on DRILL-5258:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/752#discussion_r102362622
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MockStorageEngine.java
 ---
@@ -109,7 +121,37 @@ public MockSchema(MockStorageEngine engine) {
 
 @Override
 public Table getTable(String name) {
-  Pattern p = Pattern.compile("(\\w+)_(\\d+)(k|m)?", 
Pattern.CASE_INSENSITIVE);
+  if (name.toLowerCase().endsWith(".json") ) {
+return getConfigFile(name);
+  } else {
+return getDirectTable(name);
+  }
+}
+
+private Table getConfigFile(String name) {
+  final URL url = Resources.getResource(name);
+  if (url == null) {
+throw new IllegalArgumentException(
+"Unable to find mock table config file " + name);
+  }
+  MockTableDef mockTableDefn;
+  try {
+String json = Resources.toString(url, Charsets.UTF_8);
+final ObjectMapper mapper = new ObjectMapper();
+mapper.configure(JsonParser.Feature.ALLOW_UNQUOTED_FIELD_NAMES, 
true);
+mockTableDefn = mapper.readValue(json, MockTableDef.class);
+  } catch (JsonParseException e) {
+throw new IllegalArgumentException( "Unable to parse mock table 
definition file: " + name, e );
+  } catch (JsonMappingException e) {
+throw new IllegalArgumentException( "Unable to Jackson deserialize 
mock table definition file: " + name, e );
+  } catch (IOException e) {
+throw new IllegalArgumentException( "Unable to read mock table 
definition file: " + name, e );
+  }
+  return new DynamicDrillTable(engine, this.name, 
mockTableDefn.getEntries() );
--- End diff --

Please remove extra space before last` )`. there are few other places too 
like line 169 in this file, line 199 (MockTableDef.java), etc.


> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872710#comment-15872710
 ] 

ASF GitHub Bot commented on DRILL-5258:
---

GitHub user paul-rogers opened a pull request:

https://github.com/apache/drill/pull/752

DRILL-5258: Access mock data definition from SQL

Extends the mock data source to allow using the full power of the mock
data source from an SQL query by referencing the JSON definition
file. See JIRA and package-info for details.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/paul-rogers/drill DRILL-5258

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/752.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #752


commit eb9860d4365f60da3b3c22fc0f96a9acfd31ed5c
Author: Paul Rogers 
Date:   2017-02-14T18:02:13Z

DRILL-5258: Access mock data definition from SQL

Extends the mock data source to allow using the full power of the mock
data source from an SQL query by referencing the JSON definition
file. See JIRA and package-info for details.




> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)