[jira] [Commented] (SPARK-18165) Kinesis support in Structured Streaming

2018-03-08 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391501#comment-16391501 ] Gaurav Shah commented on SPARK-18165: - Databricks have it implemented not sure why is it exclusive

[jira] [Commented] (SPARK-13127) Upgrade Parquet to 1.9 (Fixes parquet sorting)

2017-12-28 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305612#comment-16305612 ] Gaurav Shah commented on SPARK-13127: - I am surprised people haven't hit

[jira] [Comment Edited] (SPARK-22248) spark marks all columns as null when its unable to parse single column

2017-10-11 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201433#comment-16201433 ] Gaurav Shah edited comment on SPARK-22248 at 10/12/17 4:49 AM: --- [~maropu] I

[jira] [Comment Edited] (SPARK-22248) spark marks all columns as null when its unable to parse single column

2017-10-11 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200074#comment-16200074 ] Gaurav Shah edited comment on SPARK-22248 at 10/12/17 4:50 AM: --- We can work

[jira] [Commented] (SPARK-22248) spark marks all columns as null when its unable to parse single column

2017-10-11 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201433#comment-16201433 ] Gaurav Shah commented on SPARK-22248: - [~maropu] I am not sure on CSV, but on JSON we tokenize the

[jira] [Updated] (SPARK-22248) spark marks all columns as null when its unable to parse single column

2017-10-11 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gaurav Shah updated SPARK-22248: Summary: spark marks all columns as null when its unable to parse single column (was: spark marks

[jira] [Commented] (SPARK-22248) spark marks all columns as null when its unable to parse one column

2017-10-11 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200074#comment-16200074 ] Gaurav Shah commented on SPARK-22248: - We can work on a patch request unless there wan an explicit

[jira] [Comment Edited] (SPARK-20712) [SPARK 2.1 REGRESSION][SQL] Spark can't read Hive table when column type has length greater than 4000 bytes

2017-10-06 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16194542#comment-16194542 ] Gaurav Shah edited comment on SPARK-20712 at 10/6/17 12:38 PM: --- [~maver1ck]

[jira] [Commented] (SPARK-20712) [SPARK 2.1 REGRESSION][SQL] Spark can't read Hive table when column type has length greater than 4000 bytes

2017-10-06 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16194542#comment-16194542 ] Gaurav Shah commented on SPARK-20712: - [~maver1ck] I am facing the same issue, I tried executing the

[jira] [Commented] (SPARK-20462) Spark-Kinesis Direct Connector

2017-09-13 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164429#comment-16164429 ] Gaurav Shah commented on SPARK-20462: - Flink has implemented in similar fashion

[jira] [Commented] (SPARK-20462) Spark-Kinesis Direct Connector

2017-08-29 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146614#comment-16146614 ] Gaurav Shah commented on SPARK-20462: - related blog post:

[jira] [Commented] (SPARK-17423) Support IGNORE NULLS option in Window functions

2017-08-18 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133095#comment-16133095 ] Gaurav Shah commented on SPARK-17423: - is there a way to do second non-null value ? changing the

[jira] [Commented] (SPARK-4502) Spark SQL reads unneccesary nested fields from Parquet

2017-08-15 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128318#comment-16128318 ] Gaurav Shah commented on SPARK-4502: [~marmbrus] Do you have some time to review this pull request ?

[jira] [Created] (SPARK-20370) create external table on read only location fails

2017-04-18 Thread Gaurav Shah (JIRA)
Gaurav Shah created SPARK-20370: --- Summary: create external table on read only location fails Key: SPARK-20370 URL: https://issues.apache.org/jira/browse/SPARK-20370 Project: Spark Issue Type:

[jira] [Commented] (SPARK-18165) Kinesis support in Structured Streaming

2017-03-18 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931574#comment-15931574 ] Gaurav Shah commented on SPARK-18165: - anything that I can do to help for this feature ? > Kinesis

[jira] [Commented] (SPARK-19304) Kinesis checkpoint recovery is 10x slow

2017-02-07 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856772#comment-15856772 ] Gaurav Shah commented on SPARK-19304: - went ahead with a compromised approach for better code. We can

[jira] [Commented] (SPARK-19304) Kinesis checkpoint recovery is 10x slow

2017-01-31 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848042#comment-15848042 ] Gaurav Shah commented on SPARK-19304: - [~srowen] tried working through the code but unsure how to

[jira] (SPARK-9215) Implement WAL-free Kinesis receiver that give at-least once guarantee

2017-01-29 Thread Gaurav Shah (JIRA)
Title: Message Title Gaurav Shah commented on SPARK-9215

[jira] [Commented] (SPARK-9215) Implement WAL-free Kinesis receiver that give at-least once guarantee

2017-01-26 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15841011#comment-15841011 ] Gaurav Shah commented on SPARK-9215: [~tdas] ping > Implement WAL-free Kinesis receiver that give

[jira] [Commented] (SPARK-19304) Kinesis checkpoint recovery is 10x slow

2017-01-26 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15841009#comment-15841009 ] Gaurav Shah commented on SPARK-19304: - There are two issues in `KinesisSequenceRangeIterator.getNext`

[jira] [Created] (SPARK-19304) Kinesis checkpoint recovery is 10x slow

2017-01-19 Thread Gaurav Shah (JIRA)
Gaurav Shah created SPARK-19304: --- Summary: Kinesis checkpoint recovery is 10x slow Key: SPARK-19304 URL: https://issues.apache.org/jira/browse/SPARK-19304 Project: Spark Issue Type: Bug

[jira] [Closed] (SPARK-17527) mergeSchema with `_OPTIONAL_` metadata fails

2017-01-09 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gaurav Shah closed SPARK-17527. --- Resolution: Invalid the issue occured in the first place due to using schema generated by spark 2

[jira] [Comment Edited] (SPARK-9215) Implement WAL-free Kinesis receiver that give at-least once guarantee

2017-01-06 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804425#comment-15804425 ] Gaurav Shah edited comment on SPARK-9215 at 1/7/17 2:04 AM: [~tdas] I know

[jira] [Commented] (SPARK-9215) Implement WAL-free Kinesis receiver that give at-least once guarantee

2017-01-06 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804425#comment-15804425 ] Gaurav Shah commented on SPARK-9215: [~tdas] I know this is an old pull request but was still

[jira] [Commented] (SPARK-17593) list files on s3 very slow

2016-10-10 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562171#comment-15562171 ] Gaurav Shah commented on SPARK-17593: - added detail explanation and solution here

[jira] [Commented] (SPARK-17527) mergeSchema with `_OPTIONAL_` metadata fails

2016-10-10 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562162#comment-15562162 ] Gaurav Shah commented on SPARK-17527: - I think what has happened is I created schema using spark

[jira] [Comment Edited] (SPARK-17527) mergeSchema with `_OPTIONAL_` metadata fails

2016-09-23 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515948#comment-15515948 ] Gaurav Shah edited comment on SPARK-17527 at 9/23/16 9:31 AM: -- I am unable

[jira] [Commented] (SPARK-17527) mergeSchema with `_OPTIONAL_` metadata fails

2016-09-23 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515948#comment-15515948 ] Gaurav Shah commented on SPARK-17527: - I am unable to create a smaller script that can reproduce this

[jira] [Commented] (SPARK-17593) list files on s3 very slow

2016-09-19 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503675#comment-15503675 ] Gaurav Shah commented on SPARK-17593: - I definitely agree that flattening out will help, ( not sure

[jira] [Commented] (SPARK-17593) list files on s3 very slow

2016-09-19 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503668#comment-15503668 ] Gaurav Shah commented on SPARK-17593: - Thanks [~ste...@apache.org] S3 is definitely slower than hdfs

[jira] [Updated] (SPARK-17593) list files on s3 very slow

2016-09-19 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gaurav Shah updated SPARK-17593: Description: lets say we have following partitioned data: {code} events_v3 --

[jira] [Updated] (SPARK-17593) list files on s3 very slow

2016-09-19 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gaurav Shah updated SPARK-17593: Description: lets say we have following partitioned data: {code} events_v3 --

[jira] [Commented] (SPARK-17527) mergeSchema with `_OPTIONAL_` metadata fails

2016-09-19 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503581#comment-15503581 ] Gaurav Shah commented on SPARK-17527: - Can I do that in two days ? stuck with something else as of

[jira] [Commented] (SPARK-16121) ListingFileCatalog does not list in parallel anymore

2016-09-19 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503278#comment-15503278 ] Gaurav Shah commented on SPARK-16121: - Thanks [~srowen] > ListingFileCatalog does not list in

[jira] [Commented] (SPARK-17593) list files on s3 very slow

2016-09-19 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503207#comment-15503207 ] Gaurav Shah commented on SPARK-17593: - Thanks [~srowen] tried after your comment, but that didn't

[jira] [Issue Comment Deleted] (SPARK-17593) list files on s3 very slow

2016-09-19 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gaurav Shah updated SPARK-17593: Comment: was deleted (was: Thanks [~srowen] my spark code does use `s3n` ) > list files on s3

[jira] [Commented] (SPARK-17593) list files on s3 very slow

2016-09-19 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503164#comment-15503164 ] Gaurav Shah commented on SPARK-17593: - Thanks [~srowen] my spark code does use `s3n` > list files

[jira] [Created] (SPARK-17593) list files on s3 very slow

2016-09-19 Thread Gaurav Shah (JIRA)
Gaurav Shah created SPARK-17593: --- Summary: list files on s3 very slow Key: SPARK-17593 URL: https://issues.apache.org/jira/browse/SPARK-17593 Project: Spark Issue Type: Bug Affects

[jira] [Commented] (SPARK-16121) ListingFileCatalog does not list in parallel anymore

2016-09-16 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496234#comment-15496234 ] Gaurav Shah commented on SPARK-16121: - [~mengxr] was this fixed in 2.0.0 or is it planned for 2.0.1,

[jira] [Created] (SPARK-17527) mergeSchema with `_OPTIONAL_` metadata fails

2016-09-13 Thread Gaurav Shah (JIRA)
Gaurav Shah created SPARK-17527: --- Summary: mergeSchema with `_OPTIONAL_` metadata fails Key: SPARK-17527 URL: https://issues.apache.org/jira/browse/SPARK-17527 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-2984) FileNotFoundException on _temporary directory

2016-07-28 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397654#comment-15397654 ] Gaurav Shah commented on SPARK-2984: [~dmaverick] Can you tell a little more on sequential import vs

[jira] [Commented] (SPARK-2984) FileNotFoundException on _temporary directory

2016-06-05 Thread Gaurav Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316222#comment-15316222 ] Gaurav Shah commented on SPARK-2984: seeing similar errors with spark 1.6 {noformat} App > Caused