[ https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615910#comment-14615910 ]
Steven Phillips commented on DRILL-3353: ---------------------------------------- Is it possible to share your data? It would make it much easier to reproduce and fix the problem. > Non data-type related schema changes errors > ------------------------------------------- > > Key: DRILL-3353 > URL: https://issues.apache.org/jira/browse/DRILL-3353 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON > Affects Versions: 1.0.0 > Reporter: Oscar Bernal > Assignee: Steven Phillips > Fix For: 1.2.0 > > > I'm having trouble querying a data set with varying schema for a nested > object fields. The majority of my data for a specific type of record has the > following nested data: > {code} > "attributes":{"daysSinceInstall":0,"destination":"none","logged":"no","nth":1,"type":"organic","wearable":"no"}} > {code} > Among those records (hundreds of them) I have only two with a slightly > different schema: > {code} > "attributes":{"adSet":"Teste-Adwords-Engagement-Branch-iOS-230615-adset","campaign":"Teste-Adwords-Engagement-Branch-iOS-230615","channel":"Adwords","daysSinceInstall":0,"destination":"none","logged":"no","nth":4,"type":"branch","wearable":"no"}} > {code} > When trying to query the "new" fields, my queries fail: > With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code} > {noformat} > 0: jdbc:drill:zk=local> select log.event.attributes from > `dfs`.`root`.`/file.json` as log where log.si = > '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = > 'Teste-FB-Engagement-Puro-iOS-230615"'; > Error: SYSTEM ERROR: java.lang.NumberFormatException: > Teste-FB-Engagement-Puro-iOS-230615" > Fragment 0:0 > [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on > ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) > {noformat} > With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code} > {noformat} > 0: jdbc:drill:zk=local> select log.event.attributes from > `dfs`.`root`.`/file.json` as log where log.si = > '07A3F985-4B34-4A01-9B83-3B14548EF7BE'; > Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type > when you are using a ValueWriter of type NullableVarCharWriterImpl. > File file.json > Record 35 > Fragment 0:0 > [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on > ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) > {noformat} > If I try to extract all "attributes" from those events, Drill will only > return a subset of the fields, ignoring the others. > {noformat} > 0: jdbc:drill:zk=local> select log.event.attributes from > `dfs`.`root`.`/file.json` as log where log.si = > '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App'; > +----------------------------------------------------+ > | EXPR$0 | > +----------------------------------------------------+ > | {"logged":"no","wearable":"no","type":"xxxx"} | > | {"logged":"no","wearable":"no","type":"xxxx"} | > | {"logged":"no","wearable":"no","type":"xxxx"} | > | {"logged":"no","wearable":"no","type":"xxxx"} | > | {"logged":"no","wearable":"no","type":"xxxx"} | > +----------------------------------------------------+ > {noformat} > What I find strange is that I have thousands of records in the same file with > different schema for different record types and all other queries seem run > well. > Is there something about how Drill infers schema that I might be missing > here? Does it infer based on a sample % of the data and fail for records that > were not taken into account while inferring schema? I suspect I wouldn't have > this error if I had 100's of records with that other schema inside the file, > but I can't find anything in the docs or code to support that hypothesis. > Perhaps it's just a bug? Is it expected? > Troubleshooting guide seems to mention something about this but it's very > vague in implying Drill doesn't fully support schema changes. I thought that > was for data type changes mostly, for which there are other well documented > issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)