[jira] [Commented] (DRILL-4653) Malformed JSON should not stop the entire query from progressing

2016-11-07 Thread Subbu Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15645298#comment-15645298
 ] 

Subbu Srinivasan commented on DRILL-4653:
-

Will look at those JSON issues shortly.


On Thu, Nov 3, 2016 at 11:51 AM, Khurram Faraaz (JIRA) 



> Malformed JSON should not stop the entire query from progressing
> 
>
> Key: DRILL-4653
> URL: https://issues.apache.org/jira/browse/DRILL-4653
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.6.0
>Reporter: subbu srinivasan
> Fix For: 1.9.0
>
>
> Currently Drill query terminates upon first encounter of a invalid JSON line.
> Drill has to continue progressing after ignoring the bad records. Something 
> similar to a setting of (ignore.malformed.json) would help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4653) Malformed JSON should not stop the entire query from progressing

2016-11-02 Thread Subbu Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630420#comment-15630420
 ] 

Subbu Srinivasan commented on DRILL-4653:
-

No- The default mode has to be off, this is the consensus of the community
during discussions.


On Wed, Nov 2, 2016 at 12:03 PM, Khurram Faraaz (JIRA) 



> Malformed JSON should not stop the entire query from progressing
> 
>
> Key: DRILL-4653
> URL: https://issues.apache.org/jira/browse/DRILL-4653
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.6.0
>Reporter: subbu srinivasan
> Fix For: 1.9.0
>
>
> Currently Drill query terminates upon first encounter of a invalid JSON line.
> Drill has to continue progressing after ignoring the bad records. Something 
> similar to a setting of (ignore.malformed.json) would help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4653) Malformed JSON should not stop the entire query from progressing

2016-11-02 Thread Subbu Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629493#comment-15629493
 ] 

Subbu Srinivasan commented on DRILL-4653:
-

Did u set store.json.reader.skip_invalid_records to true before running
your tests?

On Wed, Nov 2, 2016 at 4:30 AM, Khurram Faraaz (JIRA) 



> Malformed JSON should not stop the entire query from progressing
> 
>
> Key: DRILL-4653
> URL: https://issues.apache.org/jira/browse/DRILL-4653
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.6.0
>Reporter: subbu srinivasan
> Fix For: 1.9.0
>
>
> Currently Drill query terminates upon first encounter of a invalid JSON line.
> Drill has to continue progressing after ignoring the bad records. Something 
> similar to a setting of (ignore.malformed.json) would help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4653) Malformed JSON should not stop the entire query from progressing

2016-11-01 Thread Subbu Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15627009#comment-15627009
 ] 

Subbu Srinivasan commented on DRILL-4653:
-

Yes will do.

Sent from my iPhone



> Malformed JSON should not stop the entire query from progressing
> 
>
> Key: DRILL-4653
> URL: https://issues.apache.org/jira/browse/DRILL-4653
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.6.0
>Reporter: subbu srinivasan
> Fix For: 1.9.0
>
>
> Currently Drill query terminates upon first encounter of a invalid JSON line.
> Drill has to continue progressing after ignoring the bad records. Something 
> similar to a setting of (ignore.malformed.json) would help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4719) Need To Support IAM role based access for supporting Amazon S3

2016-06-15 Thread subbu srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332707#comment-15332707
 ] 

subbu srinivasan commented on DRILL-4719:
-

There is a fix already from the hadoop common project.

In the core-site.xml set the following:



fs.s3a.aws.credentials.provider
com.amazonaws.auth.DefaultAWSCredentialsProviderChain


The access key/id are no longer required.





> Need To Support IAM role based access for supporting Amazon S3
> --
>
> Key: DRILL-4719
> URL: https://issues.apache.org/jira/browse/DRILL-4719
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.6.0
>Reporter: subbu srinivasan
>  Labels: security
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> We need amazon secret accessid/credentials as part of the core-site.xml.
> This is not ideal in many deployments, we would use IAM roles to accomplish
> access to s3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4719) Need To Support IAM role based access for supporting Amazon S3

2016-06-13 Thread subbu srinivasan (JIRA)
subbu srinivasan created DRILL-4719:
---

 Summary: Need To Support IAM role based access for supporting 
Amazon S3
 Key: DRILL-4719
 URL: https://issues.apache.org/jira/browse/DRILL-4719
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Affects Versions: 1.6.0
Reporter: subbu srinivasan


We need amazon secret accessid/credentials as part of the core-site.xml.
This is not ideal in many deployments, we would use IAM roles to accomplish
access to s3.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4653) Malformed JSON should not stop the entire query from progressing

2016-06-13 Thread subbu srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

subbu srinivasan updated DRILL-4653:

Reviewer: Deneche A. Hakim

> Malformed JSON should not stop the entire query from progressing
> 
>
> Key: DRILL-4653
> URL: https://issues.apache.org/jira/browse/DRILL-4653
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.6.0
>Reporter: subbu srinivasan
>
> Currently Drill query terminates upon first encounter of a invalid JSON line.
> Drill has to continue progressing after ignoring the bad records. Something 
> similar to a setting of (ignore.malformed.json) would help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4653) Malformed JSON should not stop the entire query from progressing

2016-06-05 Thread Subbu Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315948#comment-15315948
 ] 

Subbu Srinivasan commented on DRILL-4653:
-

Hi Deneche,
Quick question. Do u know where we have any doc on how drill downloads and
processes files from s3? It must be using
a location on disk? Where and how to configure in a prod env?

On Wed, May 4, 2016 at 10:24 AM, Deneche A. Hakim (JIRA) 



> Malformed JSON should not stop the entire query from progressing
> 
>
> Key: DRILL-4653
> URL: https://issues.apache.org/jira/browse/DRILL-4653
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.6.0
>Reporter: subbu srinivasan
>
> Currently Drill query terminates upon first encounter of a invalid JSON line.
> Drill has to continue progressing after ignoring the bad records. Something 
> similar to a setting of (ignore.malformed.json) would help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4653) Malformed JSON should not stop the entire query from progressing

2016-05-03 Thread subbu srinivasan (JIRA)
subbu srinivasan created DRILL-4653:
---

 Summary: Malformed JSON should not stop the entire query from 
progressing
 Key: DRILL-4653
 URL: https://issues.apache.org/jira/browse/DRILL-4653
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - JSON
Affects Versions: 1.6.0
Reporter: subbu srinivasan


Currently Drill query terminates upon first encounter of a invalid JSON line.
Drill has to continue progressing after ignoring the bad records. Something 
similar to a setting of (ignore.malformed.json) would help.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4653) Malformed JSON should not stop the entire query from progressing

2016-05-03 Thread subbu srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15269945#comment-15269945
 ] 

subbu srinivasan commented on DRILL-4653:
-

Folks,
I went through the code for JsonParsing. The main call for JSON deserialization 
happens to be
in JSONReader which is called from JSONRecordParser. The issue is that a 
handleAndRaise call is made to all caught exceptions.
Would the proposal below be of acceptance to the community.
The proposal is to catch the IOException and not bail out.
try{
outside: while(recordCount < BaseValueVector.INITIAL_VALUE_ALLOCATION) {
try
{
writer.setPosition(recordCount);
write = jsonReader.write(writer);
if(write == ReadState.WRITE_SUCCEED)
{ // logger.debug("Wrote record."); recordCount++; }
else
{ // logger.debug("Exiting."); break outside; }
}
catch(IOException ex)
{ logger.error("Ignoring record. Error parsing JSON: ", ex); ++parseErrorCount; 
}
}

> Malformed JSON should not stop the entire query from progressing
> 
>
> Key: DRILL-4653
> URL: https://issues.apache.org/jira/browse/DRILL-4653
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.6.0
>Reporter: subbu srinivasan
>
> Currently Drill query terminates upon first encounter of a invalid JSON line.
> Drill has to continue progressing after ignoring the bad records. Something 
> similar to a setting of (ignore.malformed.json) would help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4352) Query fails on single corrupted parquet column

2016-05-03 Thread subbu srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15269918#comment-15269918
 ] 

subbu srinivasan commented on DRILL-4352:
-

Overloaded this bug - Should I open a separate one for Parquet?

> Query fails on single corrupted parquet column
> --
>
> Key: DRILL-4352
> URL: https://issues.apache.org/jira/browse/DRILL-4352
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Monitoring, Storage - Parquet
>Affects Versions: 1.4.0
>Reporter: F Méthot
>
> Getting this error when querying a corrupted Parquet file.
> Error: SYSTEM ERROR: IOException: FAILED_TO_UNCOMPRESSED(5)
> Fragment 1:9
> A single corrupt file among 1000s will cause a query to break.
> Encountering a corrupt files should be logged and not spoil a query.
> It would have been useful if it was clearly specified in the log which 
> parquet file is causing issue.
> Response from Ted Dunning:
> This is a lot like the problem of encountering bad lines in a line oriented 
> file such as CSV or JSON. 
> Drill doesn't currently have a good mechanism for skipping bad input. Or 
> rather, it has reasonably good mechanisms, but it doesn't use them well.
> I think that this is a very reasonable extension of the problem of dealing 
> with individual bad records and should be handled somehow by the parquet 
> scanner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4352) Query fails on single corrupted parquet column

2016-05-03 Thread subbu srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15269916#comment-15269916
 ] 

subbu srinivasan commented on DRILL-4352:
-

Folks,
I went through the code for JsonParsing.  The main call for JSON 
deserialization happens to be
in JSONReader which is called from JSONRecordParser. The issue is that a 
handleAndRaise call is made to all caught exceptions.

Would the proposal below be of acceptance to the community.

The proposal is to catch the IOException and not bail out. 

 try{
  outside: while(recordCount < BaseValueVector.INITIAL_VALUE_ALLOCATION) {
try
{
writer.setPosition(recordCount);
write = jsonReader.write(writer);

if(write == ReadState.WRITE_SUCCEED) {
//  logger.debug("Wrote record.");
  recordCount++;
}else{
//  logger.debug("Exiting.");
  break outside;
}
}
catch(IOException ex)
{
logger.error("Ignoring record. Error parsing JSON: ", ex);
++parseErrorCount;
}

  }



> Query fails on single corrupted parquet column
> --
>
> Key: DRILL-4352
> URL: https://issues.apache.org/jira/browse/DRILL-4352
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Monitoring, Storage - Parquet
>Affects Versions: 1.4.0
>Reporter: F Méthot
>
> Getting this error when querying a corrupted Parquet file.
> Error: SYSTEM ERROR: IOException: FAILED_TO_UNCOMPRESSED(5)
> Fragment 1:9
> A single corrupt file among 1000s will cause a query to break.
> Encountering a corrupt files should be logged and not spoil a query.
> It would have been useful if it was clearly specified in the log which 
> parquet file is causing issue.
> Response from Ted Dunning:
> This is a lot like the problem of encountering bad lines in a line oriented 
> file such as CSV or JSON. 
> Drill doesn't currently have a good mechanism for skipping bad input. Or 
> rather, it has reasonably good mechanisms, but it doesn't use them well.
> I think that this is a very reasonable extension of the problem of dealing 
> with individual bad records and should be handled somehow by the parquet 
> scanner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4352) Query fails on single corrupted parquet column

2016-05-03 Thread subbu srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15269615#comment-15269615
 ] 

subbu srinivasan commented on DRILL-4352:
-

This is a valid issue -Anyone working on this.

> Query fails on single corrupted parquet column
> --
>
> Key: DRILL-4352
> URL: https://issues.apache.org/jira/browse/DRILL-4352
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Monitoring, Storage - Parquet
>Affects Versions: 1.4.0
>Reporter: F Méthot
>
> Getting this error when querying a corrupted Parquet file.
> Error: SYSTEM ERROR: IOException: FAILED_TO_UNCOMPRESSED(5)
> Fragment 1:9
> A single corrupt file among 1000s will cause a query to break.
> Encountering a corrupt files should be logged and not spoil a query.
> It would have been useful if it was clearly specified in the log which 
> parquet file is causing issue.
> Response from Ted Dunning:
> This is a lot like the problem of encountering bad lines in a line oriented 
> file such as CSV or JSON. 
> Drill doesn't currently have a good mechanism for skipping bad input. Or 
> rather, it has reasonably good mechanisms, but it doesn't use them well.
> I think that this is a very reasonable extension of the problem of dealing 
> with individual bad records and should be handled somehow by the parquet 
> scanner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4651) S3 connector and empty bucket issue

2016-05-03 Thread subbu srinivasan (JIRA)
subbu srinivasan created DRILL-4651:
---

 Summary: S3 connector and empty bucket issue
 Key: DRILL-4651
 URL: https://issues.apache.org/jira/browse/DRILL-4651
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Information Schema
Affects Versions: 1.6.0
Reporter: subbu srinivasan


show schemas  will not list the  information about a registered s3 plugin if 
the bucket is empty. This is in embedded mode.

Steps to reproduce:

- Go to http://localhost:8047/storage
- Add s3 plugin (make sure bucket empty)
- show schemas will not information about the s3 plugin/workspaces
- Add a test file to bucket and show schemas will show the plugin/workspace



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4641) Support for lzo compression

2016-05-03 Thread subbu srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15269325#comment-15269325
 ] 

subbu srinivasan commented on DRILL-4641:
-

Can we add this to documentation and resolve the issue?

> Support for lzo compression
> ---
>
> Key: DRILL-4641
> URL: https://issues.apache.org/jira/browse/DRILL-4641
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: Future
> Environment: Not specific to platform
>Reporter: subbu srinivasan
>
> Would love support for quering lzo compressed files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4641) Support for lzo compression

2016-05-02 Thread subbu srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267595#comment-15267595
 ] 

subbu srinivasan commented on DRILL-4641:
-

Jason,
Need to make following config changes to make it working.

- Modify core-site.xml to include the following. The property specifies the 
list of codecs that will be exposed by the compression interface 
(org.apache.hadoop.io.compress.CompressionCodecFactory and 
org.apache.hadoop.io.compress.CompressionCodec)


  io.compression.codecs
  
org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,

org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzopCodec



- Download the lzo java compression files  - lzo-hadoop-1.0.5.jar and 
lzo-core-1.0.5.jar

- Define the extension appropriately in the storage plugin 
 "json": {
  "type": "json",
  "extensions": [
"lzo"
  ]
},

This got me going.





> Support for lzo compression
> ---
>
> Key: DRILL-4641
> URL: https://issues.apache.org/jira/browse/DRILL-4641
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: Future
> Environment: Not specific to platform
>Reporter: subbu srinivasan
>
> Would love support for quering lzo compressed files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4641) Support for lzo compression

2016-04-28 Thread subbu srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262902#comment-15262902
 ] 

subbu srinivasan commented on DRILL-4641:
-

I am trying to query json lzo compressed files. 

Tried this setting
"json": {
  "type": "json",
  "extensions": [
"lzo"
  ]
},



0: jdbc:drill:zk=local> select count(*) from test ;
Error: DATA_READ ERROR: Error parsing JSON - Unexpected character ((CTRL-CHAR, 
code 137)): expected a valid value (number, String, array, object, 'true', 
'false' or 'null')

File  /test/1459980024674.lzo
Record  1
Column  2
Fragment 0:0


> Support for lzo compression
> ---
>
> Key: DRILL-4641
> URL: https://issues.apache.org/jira/browse/DRILL-4641
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: Future
> Environment: Not specific to platform
>Reporter: subbu srinivasan
>
> Would love support for quering lzo compressed files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4641) Support for lzo compression

2016-04-25 Thread subbu srinivasan (JIRA)
subbu srinivasan created DRILL-4641:
---

 Summary: Support for lzo compression
 Key: DRILL-4641
 URL: https://issues.apache.org/jira/browse/DRILL-4641
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Affects Versions: Future
 Environment: Not specific to platform
Reporter: subbu srinivasan


Would love support for quering lzo compressed files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)