[jira] [Commented] (SPARK-17814) spark submit arguments are truncated in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-17814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968816#comment-16968816 ] Juan Ramos Fuentes commented on SPARK-17814: Looks like I'm about 3 years too late, but I just encountered this same issue. I'm also passing a string of JSON as an arg to spark-submit and seeing the last two curly braces being replaced with empty string. My temporary solution is to avoid having two curly braces next to each other, but I was wondering if you have any thoughts for how to solve this issue [~jerryshao] [~shreyass123] > spark submit arguments are truncated in yarn-cluster mode > - > > Key: SPARK-17814 > URL: https://issues.apache.org/jira/browse/SPARK-17814 > Project: Spark > Issue Type: Bug > Components: Spark Submit, YARN >Affects Versions: 1.6.1 >Reporter: shreyas subramanya >Priority: Minor > > {noformat} > One of our customers is trying to pass in json through spark-submit as > follows: > spark-submit --verbose --class SimpleClass --master yarn-cluster ./simple.jar > "{\"mode\":\"wf\", \"arrays\":{\"array\":[1]}}" > The application reports the passed arguments as: {"mode":"wf", > "arrays":{"array":[1] > If the same application is submitted in yarn-client mode, as follows: > spark-submit --verbose --class SimpleClass --master yarn-client ./simple.jar > "{\"mode\":\"wf\", \"arrays\":{\"array\":[1]}}" > The application reports the passed args as: {"mode":"wf", > "arrays":{"array":[1]}} > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20049) Writing data to Parquet with partitions takes very long after the job finishes
[ https://issues.apache.org/jira/browse/SPARK-20049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760123#comment-16760123 ] Juan Ramos Fuentes commented on SPARK-20049: Was there ever a fix for this issue? I'm running into the same but when writing to s3 I'll try repartitioning first and see if that improves performance > Writing data to Parquet with partitions takes very long after the job finishes > -- > > Key: SPARK-20049 > URL: https://issues.apache.org/jira/browse/SPARK-20049 > Project: Spark > Issue Type: Improvement > Components: Input/Output, PySpark, SQL >Affects Versions: 2.1.0 > Environment: Spark 2.1.0, CDH 5.8, Python 3.4, Java 8, Debian > GNU/Linux 8.7 (jessie) >Reporter: Jakub Nowacki >Priority: Minor > > I was testing writing DataFrame to partitioned Parquet files.The command is > quite straight forward and the data set is really a sample from larger data > set in Parquet; the job is done in PySpark on YARN and written to HDFS: > {code} > # there is column 'date' in df > df.write.partitionBy("date").parquet("dest_dir") > {code} > The reading part took as long as usual, but after the job has been marked in > PySpark and UI as finished, the Python interpreter still was showing it as > busy. Indeed, when I checked the HDFS folder I noticed that the files are > still transferred from {{dest_dir/_temporary}} to all the {{dest_dir/date=*}} > folders. > First of all it takes much longer than saving the same set without > partitioning. Second, it is done in the background, without visible progress > of any kind. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org