[jira] [Commented] (DRILL-4653) Malformed JSON should not stop the entire query from progressing
[ https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15645298#comment-15645298 ] Subbu Srinivasan commented on DRILL-4653: - Will look at those JSON issues shortly. On Thu, Nov 3, 2016 at 11:51 AM, Khurram Faraaz (JIRA) > Malformed JSON should not stop the entire query from progressing > > > Key: DRILL-4653 > URL: https://issues.apache.org/jira/browse/DRILL-4653 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON >Affects Versions: 1.6.0 >Reporter: subbu srinivasan > Fix For: 1.9.0 > > > Currently Drill query terminates upon first encounter of a invalid JSON line. > Drill has to continue progressing after ignoring the bad records. Something > similar to a setting of (ignore.malformed.json) would help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4653) Malformed JSON should not stop the entire query from progressing
[ https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630420#comment-15630420 ] Subbu Srinivasan commented on DRILL-4653: - No- The default mode has to be off, this is the consensus of the community during discussions. On Wed, Nov 2, 2016 at 12:03 PM, Khurram Faraaz (JIRA) > Malformed JSON should not stop the entire query from progressing > > > Key: DRILL-4653 > URL: https://issues.apache.org/jira/browse/DRILL-4653 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON >Affects Versions: 1.6.0 >Reporter: subbu srinivasan > Fix For: 1.9.0 > > > Currently Drill query terminates upon first encounter of a invalid JSON line. > Drill has to continue progressing after ignoring the bad records. Something > similar to a setting of (ignore.malformed.json) would help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4653) Malformed JSON should not stop the entire query from progressing
[ https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629493#comment-15629493 ] Subbu Srinivasan commented on DRILL-4653: - Did u set store.json.reader.skip_invalid_records to true before running your tests? On Wed, Nov 2, 2016 at 4:30 AM, Khurram Faraaz (JIRA) > Malformed JSON should not stop the entire query from progressing > > > Key: DRILL-4653 > URL: https://issues.apache.org/jira/browse/DRILL-4653 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON >Affects Versions: 1.6.0 >Reporter: subbu srinivasan > Fix For: 1.9.0 > > > Currently Drill query terminates upon first encounter of a invalid JSON line. > Drill has to continue progressing after ignoring the bad records. Something > similar to a setting of (ignore.malformed.json) would help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4653) Malformed JSON should not stop the entire query from progressing
[ https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15627009#comment-15627009 ] Subbu Srinivasan commented on DRILL-4653: - Yes will do. Sent from my iPhone > Malformed JSON should not stop the entire query from progressing > > > Key: DRILL-4653 > URL: https://issues.apache.org/jira/browse/DRILL-4653 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON >Affects Versions: 1.6.0 >Reporter: subbu srinivasan > Fix For: 1.9.0 > > > Currently Drill query terminates upon first encounter of a invalid JSON line. > Drill has to continue progressing after ignoring the bad records. Something > similar to a setting of (ignore.malformed.json) would help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4719) Need To Support IAM role based access for supporting Amazon S3
[ https://issues.apache.org/jira/browse/DRILL-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332707#comment-15332707 ] subbu srinivasan commented on DRILL-4719: - There is a fix already from the hadoop common project. In the core-site.xml set the following: fs.s3a.aws.credentials.provider com.amazonaws.auth.DefaultAWSCredentialsProviderChain The access key/id are no longer required. > Need To Support IAM role based access for supporting Amazon S3 > -- > > Key: DRILL-4719 > URL: https://issues.apache.org/jira/browse/DRILL-4719 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.6.0 >Reporter: subbu srinivasan > Labels: security > Original Estimate: 24h > Remaining Estimate: 24h > > We need amazon secret accessid/credentials as part of the core-site.xml. > This is not ideal in many deployments, we would use IAM roles to accomplish > access to s3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4719) Need To Support IAM role based access for supporting Amazon S3
subbu srinivasan created DRILL-4719: --- Summary: Need To Support IAM role based access for supporting Amazon S3 Key: DRILL-4719 URL: https://issues.apache.org/jira/browse/DRILL-4719 Project: Apache Drill Issue Type: Improvement Components: Storage - Other Affects Versions: 1.6.0 Reporter: subbu srinivasan We need amazon secret accessid/credentials as part of the core-site.xml. This is not ideal in many deployments, we would use IAM roles to accomplish access to s3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4653) Malformed JSON should not stop the entire query from progressing
[ https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] subbu srinivasan updated DRILL-4653: Reviewer: Deneche A. Hakim > Malformed JSON should not stop the entire query from progressing > > > Key: DRILL-4653 > URL: https://issues.apache.org/jira/browse/DRILL-4653 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON >Affects Versions: 1.6.0 >Reporter: subbu srinivasan > > Currently Drill query terminates upon first encounter of a invalid JSON line. > Drill has to continue progressing after ignoring the bad records. Something > similar to a setting of (ignore.malformed.json) would help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4653) Malformed JSON should not stop the entire query from progressing
[ https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315948#comment-15315948 ] Subbu Srinivasan commented on DRILL-4653: - Hi Deneche, Quick question. Do u know where we have any doc on how drill downloads and processes files from s3? It must be using a location on disk? Where and how to configure in a prod env? On Wed, May 4, 2016 at 10:24 AM, Deneche A. Hakim (JIRA) > Malformed JSON should not stop the entire query from progressing > > > Key: DRILL-4653 > URL: https://issues.apache.org/jira/browse/DRILL-4653 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON >Affects Versions: 1.6.0 >Reporter: subbu srinivasan > > Currently Drill query terminates upon first encounter of a invalid JSON line. > Drill has to continue progressing after ignoring the bad records. Something > similar to a setting of (ignore.malformed.json) would help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4653) Malformed JSON should not stop the entire query from progressing
subbu srinivasan created DRILL-4653: --- Summary: Malformed JSON should not stop the entire query from progressing Key: DRILL-4653 URL: https://issues.apache.org/jira/browse/DRILL-4653 Project: Apache Drill Issue Type: Improvement Components: Storage - JSON Affects Versions: 1.6.0 Reporter: subbu srinivasan Currently Drill query terminates upon first encounter of a invalid JSON line. Drill has to continue progressing after ignoring the bad records. Something similar to a setting of (ignore.malformed.json) would help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4653) Malformed JSON should not stop the entire query from progressing
[ https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15269945#comment-15269945 ] subbu srinivasan commented on DRILL-4653: - Folks, I went through the code for JsonParsing. The main call for JSON deserialization happens to be in JSONReader which is called from JSONRecordParser. The issue is that a handleAndRaise call is made to all caught exceptions. Would the proposal below be of acceptance to the community. The proposal is to catch the IOException and not bail out. try{ outside: while(recordCount < BaseValueVector.INITIAL_VALUE_ALLOCATION) { try { writer.setPosition(recordCount); write = jsonReader.write(writer); if(write == ReadState.WRITE_SUCCEED) { // logger.debug("Wrote record."); recordCount++; } else { // logger.debug("Exiting."); break outside; } } catch(IOException ex) { logger.error("Ignoring record. Error parsing JSON: ", ex); ++parseErrorCount; } } > Malformed JSON should not stop the entire query from progressing > > > Key: DRILL-4653 > URL: https://issues.apache.org/jira/browse/DRILL-4653 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON >Affects Versions: 1.6.0 >Reporter: subbu srinivasan > > Currently Drill query terminates upon first encounter of a invalid JSON line. > Drill has to continue progressing after ignoring the bad records. Something > similar to a setting of (ignore.malformed.json) would help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4352) Query fails on single corrupted parquet column
[ https://issues.apache.org/jira/browse/DRILL-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15269918#comment-15269918 ] subbu srinivasan commented on DRILL-4352: - Overloaded this bug - Should I open a separate one for Parquet? > Query fails on single corrupted parquet column > -- > > Key: DRILL-4352 > URL: https://issues.apache.org/jira/browse/DRILL-4352 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Monitoring, Storage - Parquet >Affects Versions: 1.4.0 >Reporter: F Méthot > > Getting this error when querying a corrupted Parquet file. > Error: SYSTEM ERROR: IOException: FAILED_TO_UNCOMPRESSED(5) > Fragment 1:9 > A single corrupt file among 1000s will cause a query to break. > Encountering a corrupt files should be logged and not spoil a query. > It would have been useful if it was clearly specified in the log which > parquet file is causing issue. > Response from Ted Dunning: > This is a lot like the problem of encountering bad lines in a line oriented > file such as CSV or JSON. > Drill doesn't currently have a good mechanism for skipping bad input. Or > rather, it has reasonably good mechanisms, but it doesn't use them well. > I think that this is a very reasonable extension of the problem of dealing > with individual bad records and should be handled somehow by the parquet > scanner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4352) Query fails on single corrupted parquet column
[ https://issues.apache.org/jira/browse/DRILL-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15269916#comment-15269916 ] subbu srinivasan commented on DRILL-4352: - Folks, I went through the code for JsonParsing. The main call for JSON deserialization happens to be in JSONReader which is called from JSONRecordParser. The issue is that a handleAndRaise call is made to all caught exceptions. Would the proposal below be of acceptance to the community. The proposal is to catch the IOException and not bail out. try{ outside: while(recordCount < BaseValueVector.INITIAL_VALUE_ALLOCATION) { try { writer.setPosition(recordCount); write = jsonReader.write(writer); if(write == ReadState.WRITE_SUCCEED) { // logger.debug("Wrote record."); recordCount++; }else{ // logger.debug("Exiting."); break outside; } } catch(IOException ex) { logger.error("Ignoring record. Error parsing JSON: ", ex); ++parseErrorCount; } } > Query fails on single corrupted parquet column > -- > > Key: DRILL-4352 > URL: https://issues.apache.org/jira/browse/DRILL-4352 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Monitoring, Storage - Parquet >Affects Versions: 1.4.0 >Reporter: F Méthot > > Getting this error when querying a corrupted Parquet file. > Error: SYSTEM ERROR: IOException: FAILED_TO_UNCOMPRESSED(5) > Fragment 1:9 > A single corrupt file among 1000s will cause a query to break. > Encountering a corrupt files should be logged and not spoil a query. > It would have been useful if it was clearly specified in the log which > parquet file is causing issue. > Response from Ted Dunning: > This is a lot like the problem of encountering bad lines in a line oriented > file such as CSV or JSON. > Drill doesn't currently have a good mechanism for skipping bad input. Or > rather, it has reasonably good mechanisms, but it doesn't use them well. > I think that this is a very reasonable extension of the problem of dealing > with individual bad records and should be handled somehow by the parquet > scanner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4352) Query fails on single corrupted parquet column
[ https://issues.apache.org/jira/browse/DRILL-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15269615#comment-15269615 ] subbu srinivasan commented on DRILL-4352: - This is a valid issue -Anyone working on this. > Query fails on single corrupted parquet column > -- > > Key: DRILL-4352 > URL: https://issues.apache.org/jira/browse/DRILL-4352 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Monitoring, Storage - Parquet >Affects Versions: 1.4.0 >Reporter: F Méthot > > Getting this error when querying a corrupted Parquet file. > Error: SYSTEM ERROR: IOException: FAILED_TO_UNCOMPRESSED(5) > Fragment 1:9 > A single corrupt file among 1000s will cause a query to break. > Encountering a corrupt files should be logged and not spoil a query. > It would have been useful if it was clearly specified in the log which > parquet file is causing issue. > Response from Ted Dunning: > This is a lot like the problem of encountering bad lines in a line oriented > file such as CSV or JSON. > Drill doesn't currently have a good mechanism for skipping bad input. Or > rather, it has reasonably good mechanisms, but it doesn't use them well. > I think that this is a very reasonable extension of the problem of dealing > with individual bad records and should be handled somehow by the parquet > scanner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4651) S3 connector and empty bucket issue
subbu srinivasan created DRILL-4651: --- Summary: S3 connector and empty bucket issue Key: DRILL-4651 URL: https://issues.apache.org/jira/browse/DRILL-4651 Project: Apache Drill Issue Type: Bug Components: Storage - Information Schema Affects Versions: 1.6.0 Reporter: subbu srinivasan show schemas will not list the information about a registered s3 plugin if the bucket is empty. This is in embedded mode. Steps to reproduce: - Go to http://localhost:8047/storage - Add s3 plugin (make sure bucket empty) - show schemas will not information about the s3 plugin/workspaces - Add a test file to bucket and show schemas will show the plugin/workspace -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4641) Support for lzo compression
[ https://issues.apache.org/jira/browse/DRILL-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15269325#comment-15269325 ] subbu srinivasan commented on DRILL-4641: - Can we add this to documentation and resolve the issue? > Support for lzo compression > --- > > Key: DRILL-4641 > URL: https://issues.apache.org/jira/browse/DRILL-4641 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: Future > Environment: Not specific to platform >Reporter: subbu srinivasan > > Would love support for quering lzo compressed files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4641) Support for lzo compression
[ https://issues.apache.org/jira/browse/DRILL-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267595#comment-15267595 ] subbu srinivasan commented on DRILL-4641: - Jason, Need to make following config changes to make it working. - Modify core-site.xml to include the following. The property specifies the list of codecs that will be exposed by the compression interface (org.apache.hadoop.io.compress.CompressionCodecFactory and org.apache.hadoop.io.compress.CompressionCodec) io.compression.codecs org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzopCodec - Download the lzo java compression files - lzo-hadoop-1.0.5.jar and lzo-core-1.0.5.jar - Define the extension appropriately in the storage plugin "json": { "type": "json", "extensions": [ "lzo" ] }, This got me going. > Support for lzo compression > --- > > Key: DRILL-4641 > URL: https://issues.apache.org/jira/browse/DRILL-4641 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: Future > Environment: Not specific to platform >Reporter: subbu srinivasan > > Would love support for quering lzo compressed files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4641) Support for lzo compression
[ https://issues.apache.org/jira/browse/DRILL-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262902#comment-15262902 ] subbu srinivasan commented on DRILL-4641: - I am trying to query json lzo compressed files. Tried this setting "json": { "type": "json", "extensions": [ "lzo" ] }, 0: jdbc:drill:zk=local> select count(*) from test ; Error: DATA_READ ERROR: Error parsing JSON - Unexpected character ((CTRL-CHAR, code 137)): expected a valid value (number, String, array, object, 'true', 'false' or 'null') File /test/1459980024674.lzo Record 1 Column 2 Fragment 0:0 > Support for lzo compression > --- > > Key: DRILL-4641 > URL: https://issues.apache.org/jira/browse/DRILL-4641 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: Future > Environment: Not specific to platform >Reporter: subbu srinivasan > > Would love support for quering lzo compressed files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4641) Support for lzo compression
subbu srinivasan created DRILL-4641: --- Summary: Support for lzo compression Key: DRILL-4641 URL: https://issues.apache.org/jira/browse/DRILL-4641 Project: Apache Drill Issue Type: Improvement Components: Storage - Other Affects Versions: Future Environment: Not specific to platform Reporter: subbu srinivasan Would love support for quering lzo compressed files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)