[jira] [Commented] (SPARK-17814) spark submit arguments are truncated in yarn-cluster mode

2019-11-06 Thread Juan Ramos Fuentes (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-17814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968816#comment-16968816
 ] 

Juan Ramos Fuentes commented on SPARK-17814:


Looks like I'm about 3 years too late, but I just encountered this same issue. 
I'm also passing a string of JSON as an arg to spark-submit and seeing the last 
two curly braces being replaced with empty string. My temporary solution is to 
avoid having two curly braces next to each other, but I was wondering if you 
have any thoughts for how to solve this issue [~jerryshao] [~shreyass123]

> spark submit arguments are truncated in yarn-cluster mode
> -
>
> Key: SPARK-17814
> URL: https://issues.apache.org/jira/browse/SPARK-17814
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit, YARN
>Affects Versions: 1.6.1
>Reporter: shreyas subramanya
>Priority: Minor
>
> {noformat}
> One of our customers is trying to pass in json through spark-submit as 
> follows:
> spark-submit --verbose --class SimpleClass --master yarn-cluster ./simple.jar 
> "{\"mode\":\"wf\", \"arrays\":{\"array\":[1]}}"
> The application reports the passed arguments as: {"mode":"wf", 
> "arrays":{"array":[1]
> If the same application is submitted in yarn-client mode, as follows:
> spark-submit --verbose --class SimpleClass --master yarn-client ./simple.jar 
> "{\"mode\":\"wf\", \"arrays\":{\"array\":[1]}}"
> The application reports the passed args as: {"mode":"wf", 
> "arrays":{"array":[1]}}
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20049) Writing data to Parquet with partitions takes very long after the job finishes

2019-02-04 Thread Juan Ramos Fuentes (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760123#comment-16760123
 ] 

Juan Ramos Fuentes commented on SPARK-20049:


Was there ever a fix for this issue? I'm running into the same but when writing 
to s3

I'll try repartitioning first and see if that improves performance

> Writing data to Parquet with partitions takes very long after the job finishes
> --
>
> Key: SPARK-20049
> URL: https://issues.apache.org/jira/browse/SPARK-20049
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output, PySpark, SQL
>Affects Versions: 2.1.0
> Environment: Spark 2.1.0, CDH 5.8, Python 3.4, Java 8, Debian 
> GNU/Linux 8.7 (jessie)
>Reporter: Jakub Nowacki
>Priority: Minor
>
> I was testing writing DataFrame to partitioned Parquet files.The command is 
> quite straight forward and the data set is really a sample from larger data 
> set in Parquet; the job is done in PySpark on YARN and written to HDFS:
> {code}
> # there is column 'date' in df
> df.write.partitionBy("date").parquet("dest_dir")
> {code}
> The reading part took as long as usual, but after the job has been marked in 
> PySpark and UI as finished, the Python interpreter still was showing it as 
> busy. Indeed, when I checked the HDFS folder I noticed that the files are 
> still transferred from {{dest_dir/_temporary}} to all the {{dest_dir/date=*}} 
> folders. 
> First of all it takes much longer than saving the same set without 
> partitioning. Second, it is done in the background, without visible progress 
> of any kind. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org