subject:"\[jira\] \[Updated\] \(SPARK\-33605\) Add GCS FS\/connector to the dependencies akin to S3"

[jira] [Updated] (SPARK-33605) Add GCS FS/connector to the dependencies akin to S3

2020-11-30 Thread Rafal Wojdyla (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rafal Wojdyla updated SPARK-33605:
--
Description: 
Spark comes with some S3 batteries included, which makes it easier to use with 
S3, for GCS to work users are required to manually configure the jars. This is 
especially problematic for python users who may not be accustomed to java 
dependencies etc. This is an example of workaround for pyspark: 
[pyspark_gcs|https://github.com/ravwojdyla/pyspark_gcs]. If we include the [GCS 
connector|https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage],
 it would make things easier for GCS users.

Please let me know what you think.

  was:
Spark comes with some S3 batteries included, which makes it easier to use with 
S3, for GCS to work users are required to manually configure the jars. This is 
especially problematic for python users who may not be accustomed to java 
dependencies etc. This is an example of workaround for pyspark: 
[pyspark_gcs|https://github.com/ravwojdyla/pyspark_gcs]. If we include the [GCS 
connector|https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage],
 it would make things easier for GCS users.

The fix could be to:
 * add the [gcs-connector 
dependency|https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/gcs-connector]
 to the {{hadoop-cloud}}
 * test that there are not problematic classpath conflicts
 * test that pyspark package includes gcs connector in the jars

Please let me know what you think.


> Add GCS FS/connector to the dependencies akin to S3
> ---
>
> Key: SPARK-33605
> URL: https://issues.apache.org/jira/browse/SPARK-33605
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Spark Core
>Affects Versions: 3.0.1
>Reporter: Rafal Wojdyla
>Priority: Major
>
> Spark comes with some S3 batteries included, which makes it easier to use 
> with S3, for GCS to work users are required to manually configure the jars. 
> This is especially problematic for python users who may not be accustomed to 
> java dependencies etc. This is an example of workaround for pyspark: 
> [pyspark_gcs|https://github.com/ravwojdyla/pyspark_gcs]. If we include the 
> [GCS 
> connector|https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage],
>  it would make things easier for GCS users.
> Please let me know what you think.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33605) Add GCS FS/connector to the dependencies akin to S3

2020-11-30 Thread Rafal Wojdyla (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rafal Wojdyla updated SPARK-33605:
--
Description: 
Spark comes with some S3 batteries included, which makes it easier to use with 
S3, for GCS to work users are required to manually configure the jars. This is 
especially problematic for python users who may not be accustomed to java 
dependencies etc. This is an example of workaround for pyspark: 
[pyspark_gcs|https://github.com/ravwojdyla/pyspark_gcs]. If we include the [GCS 
connector|https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage],
 it would make things easier for GCS users.

The fix could be to:
 * add the [gcs-connector 
dependency|https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/gcs-connector]
 to the {{hadoop-cloud}}
 * test that there are not problematic classpath conflicts
 * test that pyspark package includes gcs connector in the jars

Please let me know what you think.

  was:
Spark comes with 
[hadoop-aws|https://github.com/apache/spark/blob/cb3fa6c9368e64184a5f7b19688181d11de9511c/hadoop-cloud/pom.xml#L74-L77]
 batteries included, which makes it easy to use with S3, for GCS to work users 
are required to manually configure the jars. This is especially problematic for 
python users who may not be accustomed to java dependencies etc. This is an 
example of workaround for pyspark: 
[pyspark_gcs|https://github.com/ravwojdyla/pyspark_gcs]. If we include the [GCS 
connector|https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage],
 it would make things easier for GCS users.

The fix could be to:
 * add the [gcs-connector 
dependency|https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/gcs-connector]
 to the {{hadoop-cloud}}
 * test that there are not problematic classpath conflicts
 * test that pyspark package includes gcs connector in the jars

Please let me know what you think.


> Add GCS FS/connector to the dependencies akin to S3
> ---
>
> Key: SPARK-33605
> URL: https://issues.apache.org/jira/browse/SPARK-33605
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Spark Core
>Affects Versions: 3.0.1
>Reporter: Rafal Wojdyla
>Priority: Major
>
> Spark comes with some S3 batteries included, which makes it easier to use 
> with S3, for GCS to work users are required to manually configure the jars. 
> This is especially problematic for python users who may not be accustomed to 
> java dependencies etc. This is an example of workaround for pyspark: 
> [pyspark_gcs|https://github.com/ravwojdyla/pyspark_gcs]. If we include the 
> [GCS 
> connector|https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage],
>  it would make things easier for GCS users.
> The fix could be to:
>  * add the [gcs-connector 
> dependency|https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/gcs-connector]
>  to the {{hadoop-cloud}}
>  * test that there are not problematic classpath conflicts
>  * test that pyspark package includes gcs connector in the jars
> Please let me know what you think.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33605) Add GCS FS/connector to the dependencies akin to S3

2020-11-30 Thread Rafal Wojdyla (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rafal Wojdyla updated SPARK-33605:
--
Description: 
Spark comes with 
[hadoop-aws|https://github.com/apache/spark/blob/cb3fa6c9368e64184a5f7b19688181d11de9511c/hadoop-cloud/pom.xml#L74-L77]
 batteries included, which makes it easy to use with S3, for GCS to work users 
are required to manually configure the jars. This is especially problematic for 
python users who may not be accustomed to java dependencies etc. This is an 
example of workaround for pyspark: 
[pyspark_gcs|https://github.com/ravwojdyla/pyspark_gcs]. If we include the [GCS 
connector|https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage],
 it would make things easier for GCS users.

The fix could be to:
 * add the [gcs-connector 
dependency|https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/gcs-connector]
 to the {{hadoop-cloud}}
 * test that there are not problematic classpath conflicts
 * test that pyspark package includes gcs connector in the jars

Please let me know what you think.

  was:
Spark comes with 
[hadoop-aws|https://github.com/apache/spark/blob/cb3fa6c9368e64184a5f7b19688181d11de9511c/hadoop-cloud/pom.xml#L74-L77]
 batteries included, which makes it easy to use with S3, for GCS to work users 
are required to manually configure the jars. This is especially problematic for 
python users who may not be accustomed to java dependencies etc. This is an 
example of workaround for pyspark: 
[pyspark_gcs|https://github.com/ravwojdyla/pyspark_gcs]. If we include the [GCS 
connector|https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage],
 it would make things easier for GCS users.

The fix could be to:
 * add the [gcs-connector 
dependency|https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/gcs-connector]
 to the {hadoop-cloud}
 * test that there are not problematic classpath conflicts
 * test that pyspark package includes gcs connector in the jars

Please let me know what you think.


> Add GCS FS/connector to the dependencies akin to S3
> ---
>
> Key: SPARK-33605
> URL: https://issues.apache.org/jira/browse/SPARK-33605
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Spark Core
>Affects Versions: 3.0.1
>Reporter: Rafal Wojdyla
>Priority: Major
>
> Spark comes with 
> [hadoop-aws|https://github.com/apache/spark/blob/cb3fa6c9368e64184a5f7b19688181d11de9511c/hadoop-cloud/pom.xml#L74-L77]
>  batteries included, which makes it easy to use with S3, for GCS to work 
> users are required to manually configure the jars. This is especially 
> problematic for python users who may not be accustomed to java dependencies 
> etc. This is an example of workaround for pyspark: 
> [pyspark_gcs|https://github.com/ravwojdyla/pyspark_gcs]. If we include the 
> [GCS 
> connector|https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage],
>  it would make things easier for GCS users.
> The fix could be to:
>  * add the [gcs-connector 
> dependency|https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/gcs-connector]
>  to the {{hadoop-cloud}}
>  * test that there are not problematic classpath conflicts
>  * test that pyspark package includes gcs connector in the jars
> Please let me know what you think.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33605) Add GCS FS/connector to the dependencies akin to S3

2020-11-30 Thread Rafal Wojdyla (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rafal Wojdyla updated SPARK-33605:
--
Description: 
Spark comes with 
[hadoop-aws|https://github.com/apache/spark/blob/cb3fa6c9368e64184a5f7b19688181d11de9511c/hadoop-cloud/pom.xml#L74-L77]
 batteries included, which makes it easy to use with S3, for GCS to work users 
are required to manually configure the jars. This is especially problematic for 
python users who may not be accustomed to java dependencies etc. This is an 
example of workaround for pyspark: 
[pyspark_gcs|https://github.com/ravwojdyla/pyspark_gcs]. If we include the [GCS 
connector|https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage],
 it would make things easier for GCS users.

The fix could be to:
 * add the [gcs-connector 
dependency|https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/gcs-connector]
 to the {hadoop-cloud}
 * test that there are not problematic classpath conflicts
 * test that pyspark package includes gcs connector in the jars

Please let me know what you think.

  was:
Spark comes with 
[hadoop-aws|https://github.com/apache/spark/blob/cb3fa6c9368e64184a5f7b19688181d11de9511c/hadoop-cloud/pom.xml#L74-L77]
 batteries included, which makes it easy to use with S3, for GCS to work users 
are required to manually configure the jars. This is especially problematic for 
python users who may not be accustomed to java dependencies etc. This is an 
example of workaround for pyspark: 
[pyspark_gcs|https://github.com/ravwojdyla/pyspark_gcs]. If we include the [GCS 
connector|https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage],
 it would make things easier for GCS users.

The fix could be to:
 * add the [gcs-connector 
dependency|https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/gcs-connector]
 to the `hadoop-cloud`
 * test that there are not problematic classpath conflicts
 * test that pyspark package includes gcs connector in the jars

Please let me know what you think.


> Add GCS FS/connector to the dependencies akin to S3
> ---
>
> Key: SPARK-33605
> URL: https://issues.apache.org/jira/browse/SPARK-33605
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Spark Core
>Affects Versions: 3.0.1
>Reporter: Rafal Wojdyla
>Priority: Major
>
> Spark comes with 
> [hadoop-aws|https://github.com/apache/spark/blob/cb3fa6c9368e64184a5f7b19688181d11de9511c/hadoop-cloud/pom.xml#L74-L77]
>  batteries included, which makes it easy to use with S3, for GCS to work 
> users are required to manually configure the jars. This is especially 
> problematic for python users who may not be accustomed to java dependencies 
> etc. This is an example of workaround for pyspark: 
> [pyspark_gcs|https://github.com/ravwojdyla/pyspark_gcs]. If we include the 
> [GCS 
> connector|https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage],
>  it would make things easier for GCS users.
> The fix could be to:
>  * add the [gcs-connector 
> dependency|https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/gcs-connector]
>  to the {hadoop-cloud}
>  * test that there are not problematic classpath conflicts
>  * test that pyspark package includes gcs connector in the jars
> Please let me know what you think.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33605) Add GCS FS/connector to the dependencies akin to S3

[jira] [Updated] (SPARK-33605) Add GCS FS/connector to the dependencies akin to S3

[jira] [Updated] (SPARK-33605) Add GCS FS/connector to the dependencies akin to S3

[jira] [Updated] (SPARK-33605) Add GCS FS/connector to the dependencies akin to S3

4 matches

Site Navigation

Mail list logo

Footer information