[jira] [Commented] (DRILL-3353) Non data-type related schema changes errors
[ https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498294#comment-16498294 ] ASF GitHub Bot commented on DRILL-3353: --- ilooner commented on issue #86: DRILL-3353: Fix dropping nested fields URL: https://github.com/apache/drill/pull/86#issuecomment-393951013 This was already merged. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Non data-type related schema changes errors > --- > > Key: DRILL-3353 > URL: https://issues.apache.org/jira/browse/DRILL-3353 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON >Affects Versions: 1.0.0 >Reporter: Oscar Bernal >Assignee: Steven Phillips >Priority: Major > Fix For: 1.5.0 > > Attachments: i-bfbc0a5c-ios-PulsarEvent-2015-06-23_19.json.zip > > > I'm having trouble querying a data set with varying schema for a nested > object fields. The majority of my data for a specific type of record has the > following nested data: > {code} > "attributes":{"daysSinceInstall":0,"destination":"none","logged":"no","nth":1,"type":"organic","wearable":"no"}} > {code} > Among those records (hundreds of them) I have only two with a slightly > different schema: > {code} > "attributes":{"adSet":"Teste-Adwords-Engagement-Branch-iOS-230615-adset","campaign":"Teste-Adwords-Engagement-Branch-iOS-230615","channel":"Adwords","daysSinceInstall":0,"destination":"none","logged":"no","nth":4,"type":"branch","wearable":"no"}} > {code} > When trying to query the "new" fields, my queries fail: > With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code} > {noformat} > 0: jdbc:drill:zk=local> select log.event.attributes from > `dfs`.`root`.`/file.json` as log where log.si = > '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = > 'Teste-FB-Engagement-Puro-iOS-230615'; > Error: SYSTEM ERROR: java.lang.NumberFormatException: > Teste-FB-Engagement-Puro-iOS-230615" > Fragment 0:0 > [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on > ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) > {noformat} > With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code} > {noformat} > 0: jdbc:drill:zk=local> select log.event.attributes from > `dfs`.`root`.`/file.json` as log where log.si = > '07A3F985-4B34-4A01-9B83-3B14548EF7BE'; > Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type > when you are using a ValueWriter of type NullableVarCharWriterImpl. > File file.json > Record 35 > Fragment 0:0 > [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on > ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) > {noformat} > If I try to extract all "attributes" from those events, Drill will only > return a subset of the fields, ignoring the others. > {noformat} > 0: jdbc:drill:zk=local> select log.event.attributes from > `dfs`.`root`.`/file.json` as log where log.si = > '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App'; > ++ > | EXPR$0 | > ++ > | {"logged":"no","wearable":"no","type":""} | > | {"logged":"no","wearable":"no","type":""} | > | {"logged":"no","wearable":"no","type":""} | > | {"logged":"no","wearable":"no","type":""}| > | {"logged":"no","wearable":"no","type":""} | > ++ > {noformat} > What I find strange is that I have thousands of records in the same file with > different schema for different record types and all other queries seem run > well. > Is there something about how Drill infers schema that I might be missing > here? Does it infer based on a sample % of the data and fail for records that > were not taken into account while inferring schema? I suspect I wouldn't have > this error if I had 100's of records with that other schema inside the file, > but I can't find anything in the docs or code to support that hypothesis. > Perhaps it's just a bug? Is it expected? > Troubleshooting guide seems to mention something about this but it's very > vague in implying Drill doesn't fully support schema changes. I thought that > was for data type changes mostly, for which there are other well documented > issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-3353) Non data-type related schema changes errors
[ https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498293#comment-16498293 ] ASF GitHub Bot commented on DRILL-3353: --- ilooner closed pull request #86: DRILL-3353: Fix dropping nested fields URL: https://github.com/apache/drill/pull/86 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/exec/java-exec/src/main/codegen/templates/TypeHelper.java b/exec/java-exec/src/main/codegen/templates/TypeHelper.java index d6ccd3a9bf..9c66cb718d 100644 --- a/exec/java-exec/src/main/codegen/templates/TypeHelper.java +++ b/exec/java-exec/src/main/codegen/templates/TypeHelper.java @@ -91,10 +91,10 @@ public static SqlAccessor getSqlAccessor(ValueVector vector){ throw new UnsupportedOperationException(buildErrorMessage("find sql accessor", type)); } - public static ValueVector getNewVector(SchemaPath parentPath, String name, BufferAllocator allocator, MajorType type){ + public static ValueVector getNewVector(SchemaPath parentPath, String name, BufferAllocator allocator, MajorType type, CallBack callback){ SchemaPath child = parentPath.getChild(name); MaterializedField field = MaterializedField.create(child, type); -return getNewVector(field, allocator); +return getNewVector(field, allocator, callback); } diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/OutputMutator.java b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/OutputMutator.java index 0fe79d90c8..e109ec07fa 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/OutputMutator.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/OutputMutator.java @@ -21,6 +21,7 @@ import org.apache.drill.exec.exception.SchemaChangeException; import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.util.CallBack; import org.apache.drill.exec.vector.ValueVector; /** @@ -61,4 +62,10 @@ * @return A DrillBuf that will be released at the end of the current query (and can be resized as desired during use). */ public DrillBuf getManagedBuffer(); + + /** + * + * @return the CallBack object for this mutator + */ + public CallBack getCallBack(); } diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java index 6bf1280ae0..fa454b7826 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java @@ -51,6 +51,7 @@ import org.apache.drill.exec.store.RecordReader; import org.apache.drill.exec.testing.ControlsInjector; import org.apache.drill.exec.testing.ControlsInjectorFactory; +import org.apache.drill.exec.util.CallBack; import org.apache.drill.exec.vector.AllocationHelper; import org.apache.drill.exec.vector.NullableVarCharVector; import org.apache.drill.exec.vector.SchemaChangeCallBack; @@ -210,6 +211,11 @@ public IterOutcome next() { populatePartitionVectors(); + for (VectorWrapper w : container) { +w.getValueVector().getMutator().setValueCount(recordCount); + } + + // this is a slight misuse of this metric but it will allow Readers to report how many records they generated. final boolean isNewSchema = mutator.isNewSchema(); oContext.getStats().batchReceived(0, getRecordCount(), isNewSchema); @@ -344,6 +350,11 @@ public boolean isNewSchema() { public DrillBuf getManagedBuffer() { return oContext.getManagedBuffer(); } + +@Override +public CallBack getCallBack() { + return callBack; +} } @Override diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java index 516b0282fb..10f1d7fbf7 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java @@ -276,7 +276,7 @@ private void purge() throws SchemaChangeException { SimpleRecordBatch batch = new SimpleRecordBatch(c, selectionVector4, context); SimpleRecordBatch newBatch = new SimpleRecordBatch(newContainer, null, context); if (copier == null) { - copier = RemovingRecordBatch.getGenerated4Copier(batch, context, oContext.getAllocator(), newContainer, newBatch); + copier = RemovingRecordBatch.getGenerated4Copier(batch, context, oContext.getAllocator(), newContainer, newBatch, null);
[jira] [Commented] (DRILL-3353) Non data-type related schema changes errors
[ https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127389#comment-15127389 ] Chun Chang commented on DRILL-3353: --- 1.5.0-SNAPSHOT | 9ff947288f3214fe8e525e001d89a4f91b8b0728 {noformat} 0: jdbc:drill:schema=dfs.drillTestDir> select log.event.attributes from dfs.`/drill/testdata/drill-3353/jira-3353.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App'; ++ | EXPR$0 | ++ | {"logged":"no","wearable":"no","type":"organic","daysSinceInstall":"0","destination":"none","nth":"1"} | | {"logged":"no","wearable":"no","type":"facebook","daysSinceInstall":"0","destination":"none","nth":"2","ad":"Teste-FB-Engagement-Puro-iOS-230615","adSet":"Teste-FB-Engagement-Puro-iOS-230615","campaign":"Teste-FB-Engagement-Puro-iOS-230615","channel":"Facebook-App-Engagement"} | | {"logged":"no","wearable":"no","type":"facebook","daysSinceInstall":"0","destination":"none","nth":"3"} | | {"logged":"no","wearable":"no","type":"branch","daysSinceInstall":"0","destination":"none","nth":"4","adSet":"Teste-Adwords-Engagement-Branch-iOS-230615-adset","campaign":"Teste-Adwords-Engagement-Branch-iOS-230615","channel":"Adwords"} | | {"logged":"no","wearable":"no","type":"organic","daysSinceInstall":"0","destination":"none","nth":"5"} | ++ {noformat} > Non data-type related schema changes errors > --- > > Key: DRILL-3353 > URL: https://issues.apache.org/jira/browse/DRILL-3353 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON >Affects Versions: 1.0.0 >Reporter: Oscar Bernal >Assignee: Steven Phillips > Fix For: 1.5.0 > > Attachments: i-bfbc0a5c-ios-PulsarEvent-2015-06-23_19.json.zip > > > I'm having trouble querying a data set with varying schema for a nested > object fields. The majority of my data for a specific type of record has the > following nested data: > {code} > "attributes":{"daysSinceInstall":0,"destination":"none","logged":"no","nth":1,"type":"organic","wearable":"no"}} > {code} > Among those records (hundreds of them) I have only two with a slightly > different schema: > {code} > "attributes":{"adSet":"Teste-Adwords-Engagement-Branch-iOS-230615-adset","campaign":"Teste-Adwords-Engagement-Branch-iOS-230615","channel":"Adwords","daysSinceInstall":0,"destination":"none","logged":"no","nth":4,"type":"branch","wearable":"no"}} > {code} > When trying to query the "new" fields, my queries fail: > With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code} > {noformat} > 0: jdbc:drill:zk=local> select log.event.attributes from > `dfs`.`root`.`/file.json` as log where log.si = > '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = > 'Teste-FB-Engagement-Puro-iOS-230615'; > Error: SYSTEM ERROR: java.lang.NumberFormatException: > Teste-FB-Engagement-Puro-iOS-230615" > Fragment 0:0 > [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on > ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) > {noformat} > With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code} > {noformat} > 0: jdbc:drill:zk=local> select log.event.attributes from > `dfs`.`root`.`/file.json` as log where log.si = > '07A3F985-4B34-4A01-9B83-3B14548EF7BE'; > Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type > when you are using a ValueWriter of type NullableVarCharWriterImpl. > File file.json > Record 35 > Fragment 0:0 > [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on > ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) > {noformat} > If I try to extract all "attributes" from those events, Drill will only > return a subset of the fields, ignoring the others. > {noformat} > 0: jdbc:drill:zk=local> select log.event.attributes from > `dfs`.`root`.`/file.json` as log where log.si = > '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App'; > ++ > | EXPR$0 | > ++ > | {"logged":"no","wearable":"no","type":""} | > | {"logged":"no","wearable":"no","type":""} | > | {"logged":"no","wearable":"no","type":""} | > | {"logged":"no","wearable":"no","type":""}| > | {"logged":"no","wearable":"no","type":""} | > ++ > {noformat} > What I find strange is that I have thousands of records in the same file with > different schema for different record types and all other queries seem run > well. > Is there something about how Drill infers schema that I might be missing > here? Does it infer based on a sample % of the data and fail for records that > were not taken into account while inferring
[jira] [Commented] (DRILL-3353) Non data-type related schema changes errors
[ https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640613#comment-14640613 ] ASF GitHub Bot commented on DRILL-3353: --- Github user hnfgns commented on the pull request: https://github.com/apache/drill/pull/86#issuecomment-124561688 +1 Non data-type related schema changes errors --- Key: DRILL-3353 URL: https://issues.apache.org/jira/browse/DRILL-3353 Project: Apache Drill Issue Type: Bug Components: Storage - JSON Affects Versions: 1.0.0 Reporter: Oscar Bernal Assignee: Hanifi Gunes Fix For: 1.2.0 Attachments: i-bfbc0a5c-ios-PulsarEvent-2015-06-23_19.json.zip I'm having trouble querying a data set with varying schema for a nested object fields. The majority of my data for a specific type of record has the following nested data: {code} attributes:{daysSinceInstall:0,destination:none,logged:no,nth:1,type:organic,wearable:no}} {code} Among those records (hundreds of them) I have only two with a slightly different schema: {code} attributes:{adSet:Teste-Adwords-Engagement-Branch-iOS-230615-adset,campaign:Teste-Adwords-Engagement-Branch-iOS-230615,channel:Adwords,daysSinceInstall:0,destination:none,logged:no,nth:4,type:branch,wearable:no}} {code} When trying to query the new fields, my queries fail: With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code} {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = 'Teste-FB-Engagement-Puro-iOS-230615'; Error: SYSTEM ERROR: java.lang.NumberFormatException: Teste-FB-Engagement-Puro-iOS-230615 Fragment 0:0 [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) {noformat} With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code} {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE'; Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type when you are using a ValueWriter of type NullableVarCharWriterImpl. File file.json Record 35 Fragment 0:0 [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) {noformat} If I try to extract all attributes from those events, Drill will only return a subset of the fields, ignoring the others. {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App'; ++ | EXPR$0 | ++ | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:}| | {logged:no,wearable:no,type:} | ++ {noformat} What I find strange is that I have thousands of records in the same file with different schema for different record types and all other queries seem run well. Is there something about how Drill infers schema that I might be missing here? Does it infer based on a sample % of the data and fail for records that were not taken into account while inferring schema? I suspect I wouldn't have this error if I had 100's of records with that other schema inside the file, but I can't find anything in the docs or code to support that hypothesis. Perhaps it's just a bug? Is it expected? Troubleshooting guide seems to mention something about this but it's very vague in implying Drill doesn't fully support schema changes. I thought that was for data type changes mostly, for which there are other well documented issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3353) Non data-type related schema changes errors
[ https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624934#comment-14624934 ] Oscar Bernal commented on DRILL-3353: - Thanks so much for the help and for addressing this! Non data-type related schema changes errors --- Key: DRILL-3353 URL: https://issues.apache.org/jira/browse/DRILL-3353 Project: Apache Drill Issue Type: Bug Components: Storage - JSON Affects Versions: 1.0.0 Reporter: Oscar Bernal Assignee: Hanifi Gunes Fix For: 1.2.0 Attachments: i-bfbc0a5c-ios-PulsarEvent-2015-06-23_19.json.zip I'm having trouble querying a data set with varying schema for a nested object fields. The majority of my data for a specific type of record has the following nested data: {code} attributes:{daysSinceInstall:0,destination:none,logged:no,nth:1,type:organic,wearable:no}} {code} Among those records (hundreds of them) I have only two with a slightly different schema: {code} attributes:{adSet:Teste-Adwords-Engagement-Branch-iOS-230615-adset,campaign:Teste-Adwords-Engagement-Branch-iOS-230615,channel:Adwords,daysSinceInstall:0,destination:none,logged:no,nth:4,type:branch,wearable:no}} {code} When trying to query the new fields, my queries fail: With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code} {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = 'Teste-FB-Engagement-Puro-iOS-230615'; Error: SYSTEM ERROR: java.lang.NumberFormatException: Teste-FB-Engagement-Puro-iOS-230615 Fragment 0:0 [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) {noformat} With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code} {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE'; Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type when you are using a ValueWriter of type NullableVarCharWriterImpl. File file.json Record 35 Fragment 0:0 [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) {noformat} If I try to extract all attributes from those events, Drill will only return a subset of the fields, ignoring the others. {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App'; ++ | EXPR$0 | ++ | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:}| | {logged:no,wearable:no,type:} | ++ {noformat} What I find strange is that I have thousands of records in the same file with different schema for different record types and all other queries seem run well. Is there something about how Drill infers schema that I might be missing here? Does it infer based on a sample % of the data and fail for records that were not taken into account while inferring schema? I suspect I wouldn't have this error if I had 100's of records with that other schema inside the file, but I can't find anything in the docs or code to support that hypothesis. Perhaps it's just a bug? Is it expected? Troubleshooting guide seems to mention something about this but it's very vague in implying Drill doesn't fully support schema changes. I thought that was for data type changes mostly, for which there are other well documented issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3353) Non data-type related schema changes errors
[ https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621555#comment-14621555 ] ASF GitHub Bot commented on DRILL-3353: --- GitHub user StevenMPhillips opened a pull request: https://github.com/apache/drill/pull/86 DRILL-3353: Fix dropping nested fields Use the SchemaChangeCallBack in more places to track schema changes Reset the ephemeral transfer pair when making a new transfer pair for Map or RepeatedMap You can merge this pull request into a Git repository by running: $ git pull https://github.com/StevenMPhillips/incubator-drill drill-3353 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/86.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #86 commit 6598d5efa99e7516a882ea17582d3014e13d3ca6 Author: Steven Phillips s...@apache.org Date: 2015-07-09T00:35:09Z DRILL-3353: Fix dropping nested fields Use the SchemaChangeCallBack in more places to track schema changes Reset the ephemeral transfer pair when making a new transfer pair for Map or RepeatedMap Non data-type related schema changes errors --- Key: DRILL-3353 URL: https://issues.apache.org/jira/browse/DRILL-3353 Project: Apache Drill Issue Type: Bug Components: Storage - JSON Affects Versions: 1.0.0 Reporter: Oscar Bernal Assignee: Steven Phillips Fix For: 1.2.0 Attachments: i-bfbc0a5c-ios-PulsarEvent-2015-06-23_19.json.zip I'm having trouble querying a data set with varying schema for a nested object fields. The majority of my data for a specific type of record has the following nested data: {code} attributes:{daysSinceInstall:0,destination:none,logged:no,nth:1,type:organic,wearable:no}} {code} Among those records (hundreds of them) I have only two with a slightly different schema: {code} attributes:{adSet:Teste-Adwords-Engagement-Branch-iOS-230615-adset,campaign:Teste-Adwords-Engagement-Branch-iOS-230615,channel:Adwords,daysSinceInstall:0,destination:none,logged:no,nth:4,type:branch,wearable:no}} {code} When trying to query the new fields, my queries fail: With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code} {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = 'Teste-FB-Engagement-Puro-iOS-230615'; Error: SYSTEM ERROR: java.lang.NumberFormatException: Teste-FB-Engagement-Puro-iOS-230615 Fragment 0:0 [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) {noformat} With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code} {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE'; Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type when you are using a ValueWriter of type NullableVarCharWriterImpl. File file.json Record 35 Fragment 0:0 [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) {noformat} If I try to extract all attributes from those events, Drill will only return a subset of the fields, ignoring the others. {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App'; ++ | EXPR$0 | ++ | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:}| | {logged:no,wearable:no,type:} | ++ {noformat} What I find strange is that I have thousands of records in the same file with different schema for different record types and all other queries seem run well. Is there something about how Drill infers schema that I might be missing here? Does it infer based on a sample % of the data and fail for records that were not taken into account while inferring schema? I suspect I wouldn't have this error if I had 100's of records with that other schema inside the file, but I can't find anything in the docs or code to support that hypothesis. Perhaps it's just a bug? Is it expected? Troubleshooting guide seems to mention something about this but it's
[jira] [Commented] (DRILL-3353) Non data-type related schema changes errors
[ https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621537#comment-14621537 ] Steven Phillips commented on DRILL-3353: There are several issues here. 1. {code} Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type when you are using a ValueWriter of type NullableVarCharWriterImpl. {code} This is due to the fact that in one of the records, the boolean value true has quotes around it. Thus, it is parsed as a string. Drill does not currently support changing the type of a specific field. See DRILL-3228 and DRILL-3229 for future work that will enhnace our flexibility in this regard. The current work around for this is to set all_text_mode to true, which you already know. 2. {code} Error: SYSTEM ERROR: java.lang.NumberFormatException: Teste-FB-Engagement-Puro-iOS-230615 {code} This is due to a problem with implicit cast and null fields. I filed DRILL-3477 for this issue. 3. Missing fields This is due to some bugs in Drill's processing of complex data that occurs in some operations when new fields are added. I will be posting a fix for this shortly. Non data-type related schema changes errors --- Key: DRILL-3353 URL: https://issues.apache.org/jira/browse/DRILL-3353 Project: Apache Drill Issue Type: Bug Components: Storage - JSON Affects Versions: 1.0.0 Reporter: Oscar Bernal Assignee: Steven Phillips Fix For: 1.2.0 Attachments: i-bfbc0a5c-ios-PulsarEvent-2015-06-23_19.json.zip I'm having trouble querying a data set with varying schema for a nested object fields. The majority of my data for a specific type of record has the following nested data: {code} attributes:{daysSinceInstall:0,destination:none,logged:no,nth:1,type:organic,wearable:no}} {code} Among those records (hundreds of them) I have only two with a slightly different schema: {code} attributes:{adSet:Teste-Adwords-Engagement-Branch-iOS-230615-adset,campaign:Teste-Adwords-Engagement-Branch-iOS-230615,channel:Adwords,daysSinceInstall:0,destination:none,logged:no,nth:4,type:branch,wearable:no}} {code} When trying to query the new fields, my queries fail: With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code} {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = 'Teste-FB-Engagement-Puro-iOS-230615'; Error: SYSTEM ERROR: java.lang.NumberFormatException: Teste-FB-Engagement-Puro-iOS-230615 Fragment 0:0 [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) {noformat} With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code} {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE'; Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type when you are using a ValueWriter of type NullableVarCharWriterImpl. File file.json Record 35 Fragment 0:0 [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) {noformat} If I try to extract all attributes from those events, Drill will only return a subset of the fields, ignoring the others. {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App'; ++ | EXPR$0 | ++ | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:}| | {logged:no,wearable:no,type:} | ++ {noformat} What I find strange is that I have thousands of records in the same file with different schema for different record types and all other queries seem run well. Is there something about how Drill infers schema that I might be missing here? Does it infer based on a sample % of the data and fail for records that were not taken into account while inferring schema? I suspect I wouldn't have this error if I had 100's of records with that other schema inside the file, but I can't find anything in the docs or code to support that hypothesis. Perhaps it's just a bug? Is it expected? Troubleshooting guide seems to mention something about this but it's very vague in implying Drill doesn't fully support schema changes. I thought that was
[jira] [Commented] (DRILL-3353) Non data-type related schema changes errors
[ https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615910#comment-14615910 ] Steven Phillips commented on DRILL-3353: Is it possible to share your data? It would make it much easier to reproduce and fix the problem. Non data-type related schema changes errors --- Key: DRILL-3353 URL: https://issues.apache.org/jira/browse/DRILL-3353 Project: Apache Drill Issue Type: Bug Components: Storage - JSON Affects Versions: 1.0.0 Reporter: Oscar Bernal Assignee: Steven Phillips Fix For: 1.2.0 I'm having trouble querying a data set with varying schema for a nested object fields. The majority of my data for a specific type of record has the following nested data: {code} attributes:{daysSinceInstall:0,destination:none,logged:no,nth:1,type:organic,wearable:no}} {code} Among those records (hundreds of them) I have only two with a slightly different schema: {code} attributes:{adSet:Teste-Adwords-Engagement-Branch-iOS-230615-adset,campaign:Teste-Adwords-Engagement-Branch-iOS-230615,channel:Adwords,daysSinceInstall:0,destination:none,logged:no,nth:4,type:branch,wearable:no}} {code} When trying to query the new fields, my queries fail: With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code} {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = 'Teste-FB-Engagement-Puro-iOS-230615'; Error: SYSTEM ERROR: java.lang.NumberFormatException: Teste-FB-Engagement-Puro-iOS-230615 Fragment 0:0 [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) {noformat} With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code} {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE'; Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type when you are using a ValueWriter of type NullableVarCharWriterImpl. File file.json Record 35 Fragment 0:0 [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) {noformat} If I try to extract all attributes from those events, Drill will only return a subset of the fields, ignoring the others. {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App'; ++ | EXPR$0 | ++ | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:}| | {logged:no,wearable:no,type:} | ++ {noformat} What I find strange is that I have thousands of records in the same file with different schema for different record types and all other queries seem run well. Is there something about how Drill infers schema that I might be missing here? Does it infer based on a sample % of the data and fail for records that were not taken into account while inferring schema? I suspect I wouldn't have this error if I had 100's of records with that other schema inside the file, but I can't find anything in the docs or code to support that hypothesis. Perhaps it's just a bug? Is it expected? Troubleshooting guide seems to mention something about this but it's very vague in implying Drill doesn't fully support schema changes. I thought that was for data type changes mostly, for which there are other well documented issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)