[jira] [Commented] (DRILL-3353) Non data-type related schema changes errors

2018-06-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498294#comment-16498294
 ] 

ASF GitHub Bot commented on DRILL-3353:
---

ilooner commented on issue #86: DRILL-3353: Fix dropping nested fields
URL: https://github.com/apache/drill/pull/86#issuecomment-393951013
 
 
   This was already merged.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Non data-type related schema changes errors
> ---
>
> Key: DRILL-3353
> URL: https://issues.apache.org/jira/browse/DRILL-3353
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.0.0
>Reporter: Oscar Bernal
>Assignee: Steven Phillips
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: i-bfbc0a5c-ios-PulsarEvent-2015-06-23_19.json.zip
>
>
> I'm having trouble querying a data set with varying schema for a nested 
> object fields. The majority of my data for a specific type of record has the 
> following nested data:
> {code}
> "attributes":{"daysSinceInstall":0,"destination":"none","logged":"no","nth":1,"type":"organic","wearable":"no"}}
> {code}
> Among those records (hundreds of them) I have only two with a slightly 
> different schema:
> {code}
> "attributes":{"adSet":"Teste-Adwords-Engagement-Branch-iOS-230615-adset","campaign":"Teste-Adwords-Engagement-Branch-iOS-230615","channel":"Adwords","daysSinceInstall":0,"destination":"none","logged":"no","nth":4,"type":"branch","wearable":"no"}}
> {code}
> When trying to query the "new" fields, my queries fail:
> With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code}
> {noformat}
> 0: jdbc:drill:zk=local> select log.event.attributes from 
> `dfs`.`root`.`/file.json` as log where log.si = 
> '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = 
> 'Teste-FB-Engagement-Puro-iOS-230615';
> Error: SYSTEM ERROR: java.lang.NumberFormatException: 
> Teste-FB-Engagement-Puro-iOS-230615"
> Fragment 0:0
> [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on 
> ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
> {noformat}
> With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code}
> {noformat}
> 0: jdbc:drill:zk=local> select log.event.attributes from 
> `dfs`.`root`.`/file.json` as log where log.si = 
> '07A3F985-4B34-4A01-9B83-3B14548EF7BE';
> Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type 
> when you are using a ValueWriter of type NullableVarCharWriterImpl.
> File  file.json
> Record  35
> Fragment 0:0
> [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on 
> ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
> {noformat}
> If I try to extract all "attributes" from those events, Drill will only 
> return a subset of the fields, ignoring the others. 
> {noformat}
> 0: jdbc:drill:zk=local> select log.event.attributes from 
> `dfs`.`root`.`/file.json` as log where log.si = 
> '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App';
> ++
> |   EXPR$0   |
> ++
> | {"logged":"no","wearable":"no","type":""}   |
> | {"logged":"no","wearable":"no","type":""}  |
> | {"logged":"no","wearable":"no","type":""}  |
> | {"logged":"no","wearable":"no","type":""}|
> | {"logged":"no","wearable":"no","type":""}   |
> ++
> {noformat}
> What I find strange is that I have thousands of records in the same file with 
> different schema for different record types and all other queries seem run 
> well.
> Is there something about how Drill infers schema that I might be missing 
> here? Does it infer based on a sample % of the data and fail for records that 
> were not taken into account while inferring schema? I suspect I wouldn't have 
> this error if I had 100's of records with that other schema inside the file, 
> but I can't find anything in the docs or code to support that hypothesis. 
> Perhaps it's just a bug? Is it expected?
> Troubleshooting guide seems to mention something about this but it's very 
> vague in implying Drill doesn't fully support schema changes. I thought that 
> was for data type changes mostly, for which there are other well documented 
> issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-3353) Non data-type related schema changes errors

2018-06-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498293#comment-16498293
 ] 

ASF GitHub Bot commented on DRILL-3353:
---

ilooner closed pull request #86: DRILL-3353: Fix dropping nested fields
URL: https://github.com/apache/drill/pull/86
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/exec/java-exec/src/main/codegen/templates/TypeHelper.java 
b/exec/java-exec/src/main/codegen/templates/TypeHelper.java
index d6ccd3a9bf..9c66cb718d 100644
--- a/exec/java-exec/src/main/codegen/templates/TypeHelper.java
+++ b/exec/java-exec/src/main/codegen/templates/TypeHelper.java
@@ -91,10 +91,10 @@ public static SqlAccessor getSqlAccessor(ValueVector 
vector){
 throw new UnsupportedOperationException(buildErrorMessage("find sql 
accessor", type));
   }
   
-  public static ValueVector getNewVector(SchemaPath parentPath, String name, 
BufferAllocator allocator, MajorType type){
+  public static ValueVector getNewVector(SchemaPath parentPath, String name, 
BufferAllocator allocator, MajorType type, CallBack callback){
 SchemaPath child = parentPath.getChild(name);
 MaterializedField field = MaterializedField.create(child, type);
-return getNewVector(field, allocator);
+return getNewVector(field, allocator, callback);
   }
   
   
diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/OutputMutator.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/OutputMutator.java
index 0fe79d90c8..e109ec07fa 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/OutputMutator.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/OutputMutator.java
@@ -21,6 +21,7 @@
 
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.util.CallBack;
 import org.apache.drill.exec.vector.ValueVector;
 
 /**
@@ -61,4 +62,10 @@
* @return A DrillBuf that will be released at the end of the current query 
(and can be resized as desired during use).
*/
   public DrillBuf getManagedBuffer();
+
+  /**
+   *
+   * @return the CallBack object for this mutator
+   */
+  public CallBack getCallBack();
 }
diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java
index 6bf1280ae0..fa454b7826 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java
@@ -51,6 +51,7 @@
 import org.apache.drill.exec.store.RecordReader;
 import org.apache.drill.exec.testing.ControlsInjector;
 import org.apache.drill.exec.testing.ControlsInjectorFactory;
+import org.apache.drill.exec.util.CallBack;
 import org.apache.drill.exec.vector.AllocationHelper;
 import org.apache.drill.exec.vector.NullableVarCharVector;
 import org.apache.drill.exec.vector.SchemaChangeCallBack;
@@ -210,6 +211,11 @@ public IterOutcome next() {
 
   populatePartitionVectors();
 
+  for (VectorWrapper w : container) {
+w.getValueVector().getMutator().setValueCount(recordCount);
+  }
+
+
   // this is a slight misuse of this metric but it will allow Readers to 
report how many records they generated.
   final boolean isNewSchema = mutator.isNewSchema();
   oContext.getStats().batchReceived(0, getRecordCount(), isNewSchema);
@@ -344,6 +350,11 @@ public boolean isNewSchema() {
 public DrillBuf getManagedBuffer() {
   return oContext.getManagedBuffer();
 }
+
+@Override
+public CallBack getCallBack() {
+  return callBack;
+}
   }
 
   @Override
diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
index 516b0282fb..10f1d7fbf7 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
@@ -276,7 +276,7 @@ private void purge() throws SchemaChangeException {
 SimpleRecordBatch batch = new SimpleRecordBatch(c, selectionVector4, 
context);
 SimpleRecordBatch newBatch = new SimpleRecordBatch(newContainer, null, 
context);
 if (copier == null) {
-  copier = RemovingRecordBatch.getGenerated4Copier(batch, context, 
oContext.getAllocator(),  newContainer, newBatch);
+  copier = RemovingRecordBatch.getGenerated4Copier(batch, context, 
oContext.getAllocator(),  newContainer, newBatch, null);
 

[jira] [Commented] (DRILL-3353) Non data-type related schema changes errors

2016-02-01 Thread Chun Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127389#comment-15127389
 ] 

Chun Chang commented on DRILL-3353:
---

1.5.0-SNAPSHOT  | 9ff947288f3214fe8e525e001d89a4f91b8b0728

{noformat}
0: jdbc:drill:schema=dfs.drillTestDir> select log.event.attributes from 
dfs.`/drill/testdata/drill-3353/jira-3353.json` as log where log.si = 
'07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App';
++
| EXPR$0 |
++
| 
{"logged":"no","wearable":"no","type":"organic","daysSinceInstall":"0","destination":"none","nth":"1"}
 |
| 
{"logged":"no","wearable":"no","type":"facebook","daysSinceInstall":"0","destination":"none","nth":"2","ad":"Teste-FB-Engagement-Puro-iOS-230615","adSet":"Teste-FB-Engagement-Puro-iOS-230615","campaign":"Teste-FB-Engagement-Puro-iOS-230615","channel":"Facebook-App-Engagement"}
 |
| 
{"logged":"no","wearable":"no","type":"facebook","daysSinceInstall":"0","destination":"none","nth":"3"}
 |
| 
{"logged":"no","wearable":"no","type":"branch","daysSinceInstall":"0","destination":"none","nth":"4","adSet":"Teste-Adwords-Engagement-Branch-iOS-230615-adset","campaign":"Teste-Adwords-Engagement-Branch-iOS-230615","channel":"Adwords"}
 |
| 
{"logged":"no","wearable":"no","type":"organic","daysSinceInstall":"0","destination":"none","nth":"5"}
 |
++
{noformat}

> Non data-type related schema changes errors
> ---
>
> Key: DRILL-3353
> URL: https://issues.apache.org/jira/browse/DRILL-3353
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.0.0
>Reporter: Oscar Bernal
>Assignee: Steven Phillips
> Fix For: 1.5.0
>
> Attachments: i-bfbc0a5c-ios-PulsarEvent-2015-06-23_19.json.zip
>
>
> I'm having trouble querying a data set with varying schema for a nested 
> object fields. The majority of my data for a specific type of record has the 
> following nested data:
> {code}
> "attributes":{"daysSinceInstall":0,"destination":"none","logged":"no","nth":1,"type":"organic","wearable":"no"}}
> {code}
> Among those records (hundreds of them) I have only two with a slightly 
> different schema:
> {code}
> "attributes":{"adSet":"Teste-Adwords-Engagement-Branch-iOS-230615-adset","campaign":"Teste-Adwords-Engagement-Branch-iOS-230615","channel":"Adwords","daysSinceInstall":0,"destination":"none","logged":"no","nth":4,"type":"branch","wearable":"no"}}
> {code}
> When trying to query the "new" fields, my queries fail:
> With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code}
> {noformat}
> 0: jdbc:drill:zk=local> select log.event.attributes from 
> `dfs`.`root`.`/file.json` as log where log.si = 
> '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = 
> 'Teste-FB-Engagement-Puro-iOS-230615';
> Error: SYSTEM ERROR: java.lang.NumberFormatException: 
> Teste-FB-Engagement-Puro-iOS-230615"
> Fragment 0:0
> [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on 
> ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
> {noformat}
> With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code}
> {noformat}
> 0: jdbc:drill:zk=local> select log.event.attributes from 
> `dfs`.`root`.`/file.json` as log where log.si = 
> '07A3F985-4B34-4A01-9B83-3B14548EF7BE';
> Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type 
> when you are using a ValueWriter of type NullableVarCharWriterImpl.
> File  file.json
> Record  35
> Fragment 0:0
> [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on 
> ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
> {noformat}
> If I try to extract all "attributes" from those events, Drill will only 
> return a subset of the fields, ignoring the others. 
> {noformat}
> 0: jdbc:drill:zk=local> select log.event.attributes from 
> `dfs`.`root`.`/file.json` as log where log.si = 
> '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App';
> ++
> |   EXPR$0   |
> ++
> | {"logged":"no","wearable":"no","type":""}   |
> | {"logged":"no","wearable":"no","type":""}  |
> | {"logged":"no","wearable":"no","type":""}  |
> | {"logged":"no","wearable":"no","type":""}|
> | {"logged":"no","wearable":"no","type":""}   |
> ++
> {noformat}
> What I find strange is that I have thousands of records in the same file with 
> different schema for different record types and all other queries seem run 
> well.
> Is there something about how Drill infers schema that I might be missing 
> here? Does it infer based on a sample % of the data and fail for records that 
> were not taken into account while inferring 

[jira] [Commented] (DRILL-3353) Non data-type related schema changes errors

2015-07-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640613#comment-14640613
 ] 

ASF GitHub Bot commented on DRILL-3353:
---

Github user hnfgns commented on the pull request:

https://github.com/apache/drill/pull/86#issuecomment-124561688
  
+1


 Non data-type related schema changes errors
 ---

 Key: DRILL-3353
 URL: https://issues.apache.org/jira/browse/DRILL-3353
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JSON
Affects Versions: 1.0.0
Reporter: Oscar Bernal
Assignee: Hanifi Gunes
 Fix For: 1.2.0

 Attachments: i-bfbc0a5c-ios-PulsarEvent-2015-06-23_19.json.zip


 I'm having trouble querying a data set with varying schema for a nested 
 object fields. The majority of my data for a specific type of record has the 
 following nested data:
 {code}
 attributes:{daysSinceInstall:0,destination:none,logged:no,nth:1,type:organic,wearable:no}}
 {code}
 Among those records (hundreds of them) I have only two with a slightly 
 different schema:
 {code}
 attributes:{adSet:Teste-Adwords-Engagement-Branch-iOS-230615-adset,campaign:Teste-Adwords-Engagement-Branch-iOS-230615,channel:Adwords,daysSinceInstall:0,destination:none,logged:no,nth:4,type:branch,wearable:no}}
 {code}
 When trying to query the new fields, my queries fail:
 With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code}
 {noformat}
 0: jdbc:drill:zk=local select log.event.attributes from 
 `dfs`.`root`.`/file.json` as log where log.si = 
 '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = 
 'Teste-FB-Engagement-Puro-iOS-230615';
 Error: SYSTEM ERROR: java.lang.NumberFormatException: 
 Teste-FB-Engagement-Puro-iOS-230615
 Fragment 0:0
 [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on 
 ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
 {noformat}
 With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code}
 {noformat}
 0: jdbc:drill:zk=local select log.event.attributes from 
 `dfs`.`root`.`/file.json` as log where log.si = 
 '07A3F985-4B34-4A01-9B83-3B14548EF7BE';
 Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type 
 when you are using a ValueWriter of type NullableVarCharWriterImpl.
 File  file.json
 Record  35
 Fragment 0:0
 [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on 
 ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
 {noformat}
 If I try to extract all attributes from those events, Drill will only 
 return a subset of the fields, ignoring the others. 
 {noformat}
 0: jdbc:drill:zk=local select log.event.attributes from 
 `dfs`.`root`.`/file.json` as log where log.si = 
 '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App';
 ++
 |   EXPR$0   |
 ++
 | {logged:no,wearable:no,type:}   |
 | {logged:no,wearable:no,type:}  |
 | {logged:no,wearable:no,type:}  |
 | {logged:no,wearable:no,type:}|
 | {logged:no,wearable:no,type:}   |
 ++
 {noformat}
 What I find strange is that I have thousands of records in the same file with 
 different schema for different record types and all other queries seem run 
 well.
 Is there something about how Drill infers schema that I might be missing 
 here? Does it infer based on a sample % of the data and fail for records that 
 were not taken into account while inferring schema? I suspect I wouldn't have 
 this error if I had 100's of records with that other schema inside the file, 
 but I can't find anything in the docs or code to support that hypothesis. 
 Perhaps it's just a bug? Is it expected?
 Troubleshooting guide seems to mention something about this but it's very 
 vague in implying Drill doesn't fully support schema changes. I thought that 
 was for data type changes mostly, for which there are other well documented 
 issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3353) Non data-type related schema changes errors

2015-07-13 Thread Oscar Bernal (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624934#comment-14624934
 ] 

Oscar Bernal commented on DRILL-3353:
-

Thanks so much for the help and for addressing this!

 Non data-type related schema changes errors
 ---

 Key: DRILL-3353
 URL: https://issues.apache.org/jira/browse/DRILL-3353
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JSON
Affects Versions: 1.0.0
Reporter: Oscar Bernal
Assignee: Hanifi Gunes
 Fix For: 1.2.0

 Attachments: i-bfbc0a5c-ios-PulsarEvent-2015-06-23_19.json.zip


 I'm having trouble querying a data set with varying schema for a nested 
 object fields. The majority of my data for a specific type of record has the 
 following nested data:
 {code}
 attributes:{daysSinceInstall:0,destination:none,logged:no,nth:1,type:organic,wearable:no}}
 {code}
 Among those records (hundreds of them) I have only two with a slightly 
 different schema:
 {code}
 attributes:{adSet:Teste-Adwords-Engagement-Branch-iOS-230615-adset,campaign:Teste-Adwords-Engagement-Branch-iOS-230615,channel:Adwords,daysSinceInstall:0,destination:none,logged:no,nth:4,type:branch,wearable:no}}
 {code}
 When trying to query the new fields, my queries fail:
 With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code}
 {noformat}
 0: jdbc:drill:zk=local select log.event.attributes from 
 `dfs`.`root`.`/file.json` as log where log.si = 
 '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = 
 'Teste-FB-Engagement-Puro-iOS-230615';
 Error: SYSTEM ERROR: java.lang.NumberFormatException: 
 Teste-FB-Engagement-Puro-iOS-230615
 Fragment 0:0
 [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on 
 ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
 {noformat}
 With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code}
 {noformat}
 0: jdbc:drill:zk=local select log.event.attributes from 
 `dfs`.`root`.`/file.json` as log where log.si = 
 '07A3F985-4B34-4A01-9B83-3B14548EF7BE';
 Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type 
 when you are using a ValueWriter of type NullableVarCharWriterImpl.
 File  file.json
 Record  35
 Fragment 0:0
 [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on 
 ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
 {noformat}
 If I try to extract all attributes from those events, Drill will only 
 return a subset of the fields, ignoring the others. 
 {noformat}
 0: jdbc:drill:zk=local select log.event.attributes from 
 `dfs`.`root`.`/file.json` as log where log.si = 
 '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App';
 ++
 |   EXPR$0   |
 ++
 | {logged:no,wearable:no,type:}   |
 | {logged:no,wearable:no,type:}  |
 | {logged:no,wearable:no,type:}  |
 | {logged:no,wearable:no,type:}|
 | {logged:no,wearable:no,type:}   |
 ++
 {noformat}
 What I find strange is that I have thousands of records in the same file with 
 different schema for different record types and all other queries seem run 
 well.
 Is there something about how Drill infers schema that I might be missing 
 here? Does it infer based on a sample % of the data and fail for records that 
 were not taken into account while inferring schema? I suspect I wouldn't have 
 this error if I had 100's of records with that other schema inside the file, 
 but I can't find anything in the docs or code to support that hypothesis. 
 Perhaps it's just a bug? Is it expected?
 Troubleshooting guide seems to mention something about this but it's very 
 vague in implying Drill doesn't fully support schema changes. I thought that 
 was for data type changes mostly, for which there are other well documented 
 issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3353) Non data-type related schema changes errors

2015-07-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621555#comment-14621555
 ] 

ASF GitHub Bot commented on DRILL-3353:
---

GitHub user StevenMPhillips opened a pull request:

https://github.com/apache/drill/pull/86

DRILL-3353: Fix dropping nested fields

Use the SchemaChangeCallBack in more places to track schema changes
Reset the ephemeral transfer pair when making a new transfer pair for Map 
or RepeatedMap

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StevenMPhillips/incubator-drill drill-3353

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/86.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #86


commit 6598d5efa99e7516a882ea17582d3014e13d3ca6
Author: Steven Phillips s...@apache.org
Date:   2015-07-09T00:35:09Z

DRILL-3353: Fix dropping nested fields

Use the SchemaChangeCallBack in more places to track schema changes
Reset the ephemeral transfer pair when making a new transfer pair for Map 
or RepeatedMap




 Non data-type related schema changes errors
 ---

 Key: DRILL-3353
 URL: https://issues.apache.org/jira/browse/DRILL-3353
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JSON
Affects Versions: 1.0.0
Reporter: Oscar Bernal
Assignee: Steven Phillips
 Fix For: 1.2.0

 Attachments: i-bfbc0a5c-ios-PulsarEvent-2015-06-23_19.json.zip


 I'm having trouble querying a data set with varying schema for a nested 
 object fields. The majority of my data for a specific type of record has the 
 following nested data:
 {code}
 attributes:{daysSinceInstall:0,destination:none,logged:no,nth:1,type:organic,wearable:no}}
 {code}
 Among those records (hundreds of them) I have only two with a slightly 
 different schema:
 {code}
 attributes:{adSet:Teste-Adwords-Engagement-Branch-iOS-230615-adset,campaign:Teste-Adwords-Engagement-Branch-iOS-230615,channel:Adwords,daysSinceInstall:0,destination:none,logged:no,nth:4,type:branch,wearable:no}}
 {code}
 When trying to query the new fields, my queries fail:
 With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code}
 {noformat}
 0: jdbc:drill:zk=local select log.event.attributes from 
 `dfs`.`root`.`/file.json` as log where log.si = 
 '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = 
 'Teste-FB-Engagement-Puro-iOS-230615';
 Error: SYSTEM ERROR: java.lang.NumberFormatException: 
 Teste-FB-Engagement-Puro-iOS-230615
 Fragment 0:0
 [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on 
 ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
 {noformat}
 With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code}
 {noformat}
 0: jdbc:drill:zk=local select log.event.attributes from 
 `dfs`.`root`.`/file.json` as log where log.si = 
 '07A3F985-4B34-4A01-9B83-3B14548EF7BE';
 Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type 
 when you are using a ValueWriter of type NullableVarCharWriterImpl.
 File  file.json
 Record  35
 Fragment 0:0
 [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on 
 ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
 {noformat}
 If I try to extract all attributes from those events, Drill will only 
 return a subset of the fields, ignoring the others. 
 {noformat}
 0: jdbc:drill:zk=local select log.event.attributes from 
 `dfs`.`root`.`/file.json` as log where log.si = 
 '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App';
 ++
 |   EXPR$0   |
 ++
 | {logged:no,wearable:no,type:}   |
 | {logged:no,wearable:no,type:}  |
 | {logged:no,wearable:no,type:}  |
 | {logged:no,wearable:no,type:}|
 | {logged:no,wearable:no,type:}   |
 ++
 {noformat}
 What I find strange is that I have thousands of records in the same file with 
 different schema for different record types and all other queries seem run 
 well.
 Is there something about how Drill infers schema that I might be missing 
 here? Does it infer based on a sample % of the data and fail for records that 
 were not taken into account while inferring schema? I suspect I wouldn't have 
 this error if I had 100's of records with that other schema inside the file, 
 but I can't find anything in the docs or code to support that hypothesis. 
 Perhaps it's just a bug? Is it expected?
 Troubleshooting guide seems to mention something about this but it's 

[jira] [Commented] (DRILL-3353) Non data-type related schema changes errors

2015-07-09 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621537#comment-14621537
 ] 

Steven Phillips commented on DRILL-3353:


There are several issues here.

1.
{code}
Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type when 
you are using a ValueWriter of type NullableVarCharWriterImpl.
{code}

This is due to the fact that in one of the records, the boolean value true 
has quotes around it. Thus, it is parsed as a string. Drill does not currently 
support changing the type of a specific field. See DRILL-3228 and DRILL-3229 
for future work that will enhnace our flexibility in this regard. The current 
work around for this is to set all_text_mode to true, which you already know.

2.
{code}
Error: SYSTEM ERROR: java.lang.NumberFormatException: 
Teste-FB-Engagement-Puro-iOS-230615
{code}

This is due to a problem with implicit cast and null fields. I filed DRILL-3477 
for this issue.

3. Missing fields

This is due to some bugs in Drill's processing of complex data that occurs in 
some operations when new fields are added.

I will be posting a fix for this shortly.

 Non data-type related schema changes errors
 ---

 Key: DRILL-3353
 URL: https://issues.apache.org/jira/browse/DRILL-3353
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JSON
Affects Versions: 1.0.0
Reporter: Oscar Bernal
Assignee: Steven Phillips
 Fix For: 1.2.0

 Attachments: i-bfbc0a5c-ios-PulsarEvent-2015-06-23_19.json.zip


 I'm having trouble querying a data set with varying schema for a nested 
 object fields. The majority of my data for a specific type of record has the 
 following nested data:
 {code}
 attributes:{daysSinceInstall:0,destination:none,logged:no,nth:1,type:organic,wearable:no}}
 {code}
 Among those records (hundreds of them) I have only two with a slightly 
 different schema:
 {code}
 attributes:{adSet:Teste-Adwords-Engagement-Branch-iOS-230615-adset,campaign:Teste-Adwords-Engagement-Branch-iOS-230615,channel:Adwords,daysSinceInstall:0,destination:none,logged:no,nth:4,type:branch,wearable:no}}
 {code}
 When trying to query the new fields, my queries fail:
 With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code}
 {noformat}
 0: jdbc:drill:zk=local select log.event.attributes from 
 `dfs`.`root`.`/file.json` as log where log.si = 
 '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = 
 'Teste-FB-Engagement-Puro-iOS-230615';
 Error: SYSTEM ERROR: java.lang.NumberFormatException: 
 Teste-FB-Engagement-Puro-iOS-230615
 Fragment 0:0
 [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on 
 ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
 {noformat}
 With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code}
 {noformat}
 0: jdbc:drill:zk=local select log.event.attributes from 
 `dfs`.`root`.`/file.json` as log where log.si = 
 '07A3F985-4B34-4A01-9B83-3B14548EF7BE';
 Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type 
 when you are using a ValueWriter of type NullableVarCharWriterImpl.
 File  file.json
 Record  35
 Fragment 0:0
 [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on 
 ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
 {noformat}
 If I try to extract all attributes from those events, Drill will only 
 return a subset of the fields, ignoring the others. 
 {noformat}
 0: jdbc:drill:zk=local select log.event.attributes from 
 `dfs`.`root`.`/file.json` as log where log.si = 
 '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App';
 ++
 |   EXPR$0   |
 ++
 | {logged:no,wearable:no,type:}   |
 | {logged:no,wearable:no,type:}  |
 | {logged:no,wearable:no,type:}  |
 | {logged:no,wearable:no,type:}|
 | {logged:no,wearable:no,type:}   |
 ++
 {noformat}
 What I find strange is that I have thousands of records in the same file with 
 different schema for different record types and all other queries seem run 
 well.
 Is there something about how Drill infers schema that I might be missing 
 here? Does it infer based on a sample % of the data and fail for records that 
 were not taken into account while inferring schema? I suspect I wouldn't have 
 this error if I had 100's of records with that other schema inside the file, 
 but I can't find anything in the docs or code to support that hypothesis. 
 Perhaps it's just a bug? Is it expected?
 Troubleshooting guide seems to mention something about this but it's very 
 vague in implying Drill doesn't fully support schema changes. I thought that 
 was 

[jira] [Commented] (DRILL-3353) Non data-type related schema changes errors

2015-07-06 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615910#comment-14615910
 ] 

Steven Phillips commented on DRILL-3353:


Is it possible to share your data? It would make it much easier to reproduce 
and fix the problem.

 Non data-type related schema changes errors
 ---

 Key: DRILL-3353
 URL: https://issues.apache.org/jira/browse/DRILL-3353
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JSON
Affects Versions: 1.0.0
Reporter: Oscar Bernal
Assignee: Steven Phillips
 Fix For: 1.2.0


 I'm having trouble querying a data set with varying schema for a nested 
 object fields. The majority of my data for a specific type of record has the 
 following nested data:
 {code}
 attributes:{daysSinceInstall:0,destination:none,logged:no,nth:1,type:organic,wearable:no}}
 {code}
 Among those records (hundreds of them) I have only two with a slightly 
 different schema:
 {code}
 attributes:{adSet:Teste-Adwords-Engagement-Branch-iOS-230615-adset,campaign:Teste-Adwords-Engagement-Branch-iOS-230615,channel:Adwords,daysSinceInstall:0,destination:none,logged:no,nth:4,type:branch,wearable:no}}
 {code}
 When trying to query the new fields, my queries fail:
 With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code}
 {noformat}
 0: jdbc:drill:zk=local select log.event.attributes from 
 `dfs`.`root`.`/file.json` as log where log.si = 
 '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = 
 'Teste-FB-Engagement-Puro-iOS-230615';
 Error: SYSTEM ERROR: java.lang.NumberFormatException: 
 Teste-FB-Engagement-Puro-iOS-230615
 Fragment 0:0
 [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on 
 ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
 {noformat}
 With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code}
 {noformat}
 0: jdbc:drill:zk=local select log.event.attributes from 
 `dfs`.`root`.`/file.json` as log where log.si = 
 '07A3F985-4B34-4A01-9B83-3B14548EF7BE';
 Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type 
 when you are using a ValueWriter of type NullableVarCharWriterImpl.
 File  file.json
 Record  35
 Fragment 0:0
 [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on 
 ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
 {noformat}
 If I try to extract all attributes from those events, Drill will only 
 return a subset of the fields, ignoring the others. 
 {noformat}
 0: jdbc:drill:zk=local select log.event.attributes from 
 `dfs`.`root`.`/file.json` as log where log.si = 
 '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App';
 ++
 |   EXPR$0   |
 ++
 | {logged:no,wearable:no,type:}   |
 | {logged:no,wearable:no,type:}  |
 | {logged:no,wearable:no,type:}  |
 | {logged:no,wearable:no,type:}|
 | {logged:no,wearable:no,type:}   |
 ++
 {noformat}
 What I find strange is that I have thousands of records in the same file with 
 different schema for different record types and all other queries seem run 
 well.
 Is there something about how Drill infers schema that I might be missing 
 here? Does it infer based on a sample % of the data and fail for records that 
 were not taken into account while inferring schema? I suspect I wouldn't have 
 this error if I had 100's of records with that other schema inside the file, 
 but I can't find anything in the docs or code to support that hypothesis. 
 Perhaps it's just a bug? Is it expected?
 Troubleshooting guide seems to mention something about this but it's very 
 vague in implying Drill doesn't fully support schema changes. I thought that 
 was for data type changes mostly, for which there are other well documented 
 issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)