[jira] [Commented] (SPARK-24630) SPIP: Support SQLStreaming in Spark
[ https://issues.apache.org/jira/browse/SPARK-24630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701444#comment-16701444 ] Jacky Li commented on SPARK-24630: -- Beside the CREATE TABLE/STREAM to create the source and the sink, is there any syntax to manipulate the streaming job, like starting the job and stopping the stop the job? If I understand correctly, currently INSERT statement is proposed to kick off the structstreaming job, since this streaming job is continous, I am wondering is there a way to show/desc/stop it? > SPIP: Support SQLStreaming in Spark > --- > > Key: SPARK-24630 > URL: https://issues.apache.org/jira/browse/SPARK-24630 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.2.0, 2.2.1 >Reporter: Jackey Lee >Priority: Minor > Labels: SQLStreaming > Attachments: SQLStreaming SPIP.pdf > > > At present, KafkaSQL, Flink SQL(which is actually based on Calcite), > SQLStream, StormSQL all provide a stream type SQL interface, with which users > with little knowledge about streaming, can easily develop a flow system > processing model. In Spark, we can also support SQL API based on > StructStreamig. > To support for SQL Streaming, there are two key points: > 1, Analysis should be able to parse streaming type SQL. > 2, Analyzer should be able to map metadata information to the corresponding > Relation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19700) Design an API for pluggable scheduler implementations
[ https://issues.apache.org/jira/browse/SPARK-19700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701439#comment-16701439 ] Utkarsh Maheshwari commented on SPARK-19700: [~cgbaker], have you started working on it yet? Is there any way I can help? I would be glad to. > Design an API for pluggable scheduler implementations > - > > Key: SPARK-19700 > URL: https://issues.apache.org/jira/browse/SPARK-19700 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: Matt Cheah >Priority: Major > > One point that was brought up in discussing SPARK-18278 was that schedulers > cannot easily be added to Spark without forking the whole project. The main > reason is that much of the scheduler's behavior fundamentally depends on the > CoarseGrainedSchedulerBackend class, which is not part of the public API of > Spark and is in fact quite a complex module. As resource management and > allocation continues evolves, Spark will need to be integrated with more > cluster managers, but maintaining support for all possible allocators in the > Spark project would be untenable. Furthermore, it would be impossible for > Spark to support proprietary frameworks that are developed by specific users > for their other particular use cases. > Therefore, this ticket proposes making scheduler implementations fully > pluggable. The idea is that Spark will provide a Java/Scala interface that is > to be implemented by a scheduler that is backed by the cluster manager of > interest. The user can compile their scheduler's code into a JAR that is > placed on the driver's classpath. Finally, as is the case in the current > world, the scheduler implementation is selected and dynamically loaded > depending on the user's provided master URL. > Determining the correct API is the most challenging problem. The current > CoarseGrainedSchedulerBackend handles many responsibilities, some of which > will be common across all cluster managers, and some which will be specific > to a particular cluster manager. For example, the particular mechanism for > creating the executor processes will differ between YARN and Mesos, but, once > these executors have started running, the means to submit tasks to them over > the Netty RPC is identical across the board. > We must also consider a plugin model and interface for submitting the > application as well, because different cluster managers support different > configuration options, and thus the driver must be bootstrapped accordingly. > For example, in YARN mode the application and Hadoop configuration must be > packaged and shipped to the distributed cache prior to launching the job. A > prototype of a Kubernetes implementation starts a Kubernetes pod that runs > the driver in cluster mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26159) Codegen for LocalTableScanExec
[ https://issues.apache.org/jira/browse/SPARK-26159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-26159: --- Assignee: Juliusz Sompolski > Codegen for LocalTableScanExec > -- > > Key: SPARK-26159 > URL: https://issues.apache.org/jira/browse/SPARK-26159 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Juliusz Sompolski >Assignee: Juliusz Sompolski >Priority: Major > Fix For: 3.0.0 > > > Do codegen for LocalTableScanExec. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26159) Codegen for LocalTableScanExec
[ https://issues.apache.org/jira/browse/SPARK-26159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-26159. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23127 [https://github.com/apache/spark/pull/23127] > Codegen for LocalTableScanExec > -- > > Key: SPARK-26159 > URL: https://issues.apache.org/jira/browse/SPARK-26159 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Juliusz Sompolski >Assignee: Juliusz Sompolski >Priority: Major > Fix For: 3.0.0 > > > Do codegen for LocalTableScanExec. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26155) Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB scale
[ https://issues.apache.org/jira/browse/SPARK-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701401#comment-16701401 ] Wenchen Fan commented on SPARK-26155: - Can you send a PR to revert SPARK-21052 and post the benchmark result there? Then we can start a discussion on that PR and merge it if everyone is fine with it. > Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS > in 3TB scale > -- > > Key: SPARK-26155 > URL: https://issues.apache.org/jira/browse/SPARK-26155 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Ke Jia >Priority: Major > Attachments: Q19 analysis in Spark2.3 with L486&487.pdf, Q19 analysis > in Spark2.3 without L486&487.pdf, q19.sql > > > In our test environment, we found a serious performance degradation issue in > Spark2.3 when running TPC-DS on SKX 8180. Several queries have serious > performance degradation. For example, TPC-DS Q19 needs 126 seconds with Spark > 2.3 while it needs only 29 seconds with Spark2.1 on 3TB data. We investigated > this problem and figured out the root cause is in community patch SPARK-21052 > which add metrics to hash join process. And the impact code is > [L486|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L486] > and > [L487|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L487] > . Q19 costs about 30 seconds without these two lines code and 126 seconds > with these code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23545) [Spark-Core] port opened by the SparkDriver is vulnerable for flooding attacks
[ https://issues.apache.org/jira/browse/SPARK-23545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandeep katta resolved SPARK-23545. --- Resolution: Invalid > [Spark-Core] port opened by the SparkDriver is vulnerable for flooding attacks > -- > > Key: SPARK-23545 > URL: https://issues.apache.org/jira/browse/SPARK-23545 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: sandeep katta >Priority: Major > > port opened by the SparkDriver is vulnerable for flooding attacks > *Steps*: > set spark.network.timeout=60s //can be any value > Start the thriftserver in client mode and you can see in below logs that the > spark Driver opens the port for AM and executors to communicate. > Logs: > 018-03-01 16:11:16,497 | INFO | [main] | Successfully started service > *'sparkDriver'* on port *22643*. | > org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > 2018-03-01 16:11:17,265 | INFO | [main] | Successfully started service > 'SparkUI' on port 22950. | > org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > 2018-03-01 16:11:44,640 | INFO | [main] | Successfully started service > 'org.apache.spark.network.netty.NettyBlockTransferService' on port 22663. | > org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > 2018-03-01 16:11:52,822 | INFO | [Thread-56] | Starting > ThriftBinaryCLIService on port 22550 with 5...501 worker threads | > org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.run(ThriftBinaryCLIService.java:111) > Do telnet to this port using *telnet IP 22643* command and keep it idle, > after 60 seconds check the status, connection is still established, it should > be terminated > *lsof command output along with the date* > > host1:/var/ # date > Thu Mar 1 *16:12:55* CST 2018 > host1:/var/ # lsof | grep 22643 > java 66730 user1 292u IPv6 1482635919 0t0 TCP > host1:22643->*10.18.152.191:59297* (ESTABLISHED) > java 66730 user1 297u IPv6 1482374122 0t0 TCP > host1:22643->BLR118529:43894 (ESTABLISHED) > java 66730 user1 346u IPv6 1482314249 0t0 TCP host1:22643 (LISTEN) > host1:/var/ # date > Thu Mar 1 16:13:43 CST 2018 > host1:/var/ # date > Thu Mar 1 *16:16:55* CST 2018 > host1:/var/ # lsof | grep 22643 > java 66730 user1 292u IPv6 1482635919 0t0 TCP > host1:22643->*10.18.152.191:59297* (ESTABLISHED) > java 66730 user1 297u IPv6 1482374122 0t0 TCP > host1:22643->BLR118529:43894 (ESTABLISHED) > java 66730 user1 346u IPv6 1482314249 0t0 TCP host1:22643 (LISTEN) > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24176) The hdfs file path with wildcard can not be identified when loading data
[ https://issues.apache.org/jira/browse/SPARK-24176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ABHISHEK KUMAR GUPTA resolved SPARK-24176. -- Resolution: Duplicate closed as 23425 JIRA > The hdfs file path with wildcard can not be identified when loading data > > > Key: SPARK-24176 > URL: https://issues.apache.org/jira/browse/SPARK-24176 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 > Environment: OS: SUSE11 > Spark Version:2.3 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > # Launch spark-sql > # create table wild1 (time timestamp, name string, isright boolean, > datetoday date, num binary, height double, score float, decimaler > decimal(10,0), id tinyint, age int, license bigint, length smallint) row > format delimited fields terminated by ',' stored as textfile; > # loaded data in table as below and it failed some cases not consistent > # load data inpath '/user/testdemo1/user1/?ype* ' into table wild1; - Success > load data inpath '/user/testdemo1/user1/t??eddata60.txt' into table wild1; - > *Failed* > load data inpath '/user/testdemo1/user1/?ypeddata60.txt' into table wild1; - > Success > Exception as below > > load data inpath '/user/testdemo1/user1/t??eddata61.txt' into table wild1; > 2018-05-04 13:16:25 INFO HiveMetaStore:746 - 0: get_database: one > 2018-05-04 13:16:25 INFO audit:371 - ugi=spark/had...@hadoop.com > ip=unknown-ip-addr cmd=get_database: one > 2018-05-04 13:16:25 INFO HiveMetaStore:746 - 0: get_table : db=one tbl=wild1 > 2018-05-04 13:16:25 INFO audit:371 - ugi=spark/had...@hadoop.com > ip=unknown-ip-addr cmd=get_table : db=one tbl=wild1 > 2018-05-04 13:16:25 INFO HiveMetaStore:746 - 0: get_table : db=one tbl=wild1 > 2018-05-04 13:16:25 INFO audit:371 - ugi=spark/had...@hadoop.com > ip=unknown-ip-addr cmd=get_table : db=one tbl=wild1 > *Error in query: LOAD DATA input path does not exist: > /user/testdemo1/user1/t??eddata61.txt;* > spark-sql> > Behavior is not consistent. Need to fix with all combination of wild card > char as it is not consistent. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26189) Fix the doc of unionAll in SparkR
[ https://issues.apache.org/jira/browse/SPARK-26189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26189: Assignee: (was: Apache Spark) > Fix the doc of unionAll in SparkR > - > > Key: SPARK-26189 > URL: https://issues.apache.org/jira/browse/SPARK-26189 > Project: Spark > Issue Type: Documentation > Components: R >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Minor > > We should fix the doc of unionAll in SparkR. See the discussion: > https://github.com/apache/spark/pull/23131/files#r236760822 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26189) Fix the doc of unionAll in SparkR
[ https://issues.apache.org/jira/browse/SPARK-26189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701383#comment-16701383 ] Apache Spark commented on SPARK-26189: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/23161 > Fix the doc of unionAll in SparkR > - > > Key: SPARK-26189 > URL: https://issues.apache.org/jira/browse/SPARK-26189 > Project: Spark > Issue Type: Documentation > Components: R >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Minor > > We should fix the doc of unionAll in SparkR. See the discussion: > https://github.com/apache/spark/pull/23131/files#r236760822 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26189) Fix the doc of unionAll in SparkR
[ https://issues.apache.org/jira/browse/SPARK-26189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26189: Assignee: Apache Spark > Fix the doc of unionAll in SparkR > - > > Key: SPARK-26189 > URL: https://issues.apache.org/jira/browse/SPARK-26189 > Project: Spark > Issue Type: Documentation > Components: R >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Apache Spark >Priority: Minor > > We should fix the doc of unionAll in SparkR. See the discussion: > https://github.com/apache/spark/pull/23131/files#r236760822 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26196) Total tasks message in the stage is incorrect, when there are failed or killed tasks
[ https://issues.apache.org/jira/browse/SPARK-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shahid updated SPARK-26196: --- Description: Total tasks message in the stage page is incorrect when there are failed or killed tasks. was: Total tasks in the stage page is incorrect when there are failed or killed tasks. > Total tasks message in the stage is incorrect, when there are failed or > killed tasks > > > Key: SPARK-26196 > URL: https://issues.apache.org/jira/browse/SPARK-26196 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0 >Reporter: shahid >Priority: Major > > Total tasks message in the stage page is incorrect when there are failed or > killed tasks. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26196) Total tasks message in the stage is incorrect, when there are failed or killed tasks
[ https://issues.apache.org/jira/browse/SPARK-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shahid updated SPARK-26196: --- Summary: Total tasks message in the stage is incorrect, when there are failed or killed tasks (was: Total tasks message in the stage in incorrect, when there are failed or killed tasks) > Total tasks message in the stage is incorrect, when there are failed or > killed tasks > > > Key: SPARK-26196 > URL: https://issues.apache.org/jira/browse/SPARK-26196 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0 >Reporter: shahid >Priority: Major > > Total tasks in the stage page is incorrect when there are failed or killed > tasks. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23410) Unable to read jsons in charset different from UTF-8
[ https://issues.apache.org/jira/browse/SPARK-23410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701321#comment-16701321 ] xuqianjin commented on SPARK-23410: --- hi [~maxgekk] [~hyukjin.kwon] Thank you very much. Can I just pull the code from the latest master branch and open a PR? > Unable to read jsons in charset different from UTF-8 > > > Key: SPARK-23410 > URL: https://issues.apache.org/jira/browse/SPARK-23410 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Maxim Gekk >Priority: Major > Attachments: utf16WithBOM.json > > > Currently the Json Parser is forced to read json files in UTF-8. Such > behavior breaks backward compatibility with Spark 2.2.1 and previous versions > that can read json files in UTF-16, UTF-32 and other encodings due to using > of the auto detection mechanism of the jackson library. Need to give back to > users possibility to read json files in specified charset and/or detect > charset automatically as it was before. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26155) Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB scale
[ https://issues.apache.org/jira/browse/SPARK-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701317#comment-16701317 ] Ke Jia commented on SPARK-26155: [~viirya] Thanks for your reply. > "Q19 analysis in Spark2.3 without L486 & 487.pdf" has Stage time and DAG in >Spark 2.1, but the document title is Spark 2.3. Which version Spark is used >for it? My spark version is Spark2.3. And the "Stage time and DAG in Spark2.1" is my mistake. And I have re-uploaded. > Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS > in 3TB scale > -- > > Key: SPARK-26155 > URL: https://issues.apache.org/jira/browse/SPARK-26155 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Ke Jia >Priority: Major > Attachments: Q19 analysis in Spark2.3 with L486&487.pdf, Q19 analysis > in Spark2.3 without L486&487.pdf, q19.sql > > > In our test environment, we found a serious performance degradation issue in > Spark2.3 when running TPC-DS on SKX 8180. Several queries have serious > performance degradation. For example, TPC-DS Q19 needs 126 seconds with Spark > 2.3 while it needs only 29 seconds with Spark2.1 on 3TB data. We investigated > this problem and figured out the root cause is in community patch SPARK-21052 > which add metrics to hash join process. And the impact code is > [L486|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L486] > and > [L487|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L487] > . Q19 costs about 30 seconds without these two lines code and 126 seconds > with these code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26182) Cost increases when optimizing scalaUDF
[ https://issues.apache.org/jira/browse/SPARK-26182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayi Liao updated SPARK-26182: --- Description: Let's assume that we have a udf called splitUDF which outputs a map data. The SQL {code:java} select g['a'], g['b'] from ( select splitUDF(x) as g from table) tbl {code} will be optimized to the same logical plan of {code:java} select splitUDF(x)['a'], splitUDF(x)['b'] from table {code} which means that the splitUDF is executed twice instead of once. The optimization is from CollapseProject. I'm not sure whether this is a bug or not. Please tell me if I was wrong about this. was: Let's Assume that we have a udf called splitUDF which outputs a map data. The SQL {code:java} select g['a'], g['b'] from ( select splitUDF(x) as g from table) tbl {code} will be optimized to the same logical plan of {code:java} select splitUDF(x)['a'], splitUDF(x)['b'] from table {code} which means that the splitUDF is executed twice instead of once. The optimization is from CollapseProject. I'm not sure whether this is a bug or not. Please tell me if I was wrong about this. > Cost increases when optimizing scalaUDF > --- > > Key: SPARK-26182 > URL: https://issues.apache.org/jira/browse/SPARK-26182 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 2.4.0 >Reporter: Jiayi Liao >Priority: Major > > Let's assume that we have a udf called splitUDF which outputs a map data. > The SQL > {code:java} > select > g['a'], g['b'] > from >( select splitUDF(x) as g from table) tbl > {code} > will be optimized to the same logical plan of > {code:java} > select splitUDF(x)['a'], splitUDF(x)['b'] from table > {code} > which means that the splitUDF is executed twice instead of once. > The optimization is from CollapseProject. > I'm not sure whether this is a bug or not. Please tell me if I was wrong > about this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26197) Spark master fails to detect driver process pause
Jialin LIu created SPARK-26197: -- Summary: Spark master fails to detect driver process pause Key: SPARK-26197 URL: https://issues.apache.org/jira/browse/SPARK-26197 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.2 Reporter: Jialin LIu I was using Spark 2.3.2 with standalone cluster and submit job using cluster mode. After I submit the job, I deliberately pause the driver process (throughout shell command "kill -stop (driver process id) ") to see if the master can detect this problem. The result shows that the driver will never stop. All the executors will try to talk back to driver and will give up in 10 minutes. Master can detect executor failures and try to reassign new executor process to redo the job. New executor will try to create RPC connection with driver and will fail in 2 minutes. Master will endlessly spawn new executors without detecting driver failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26155) Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB scale
[ https://issues.apache.org/jira/browse/SPARK-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated SPARK-26155: --- Attachment: (was: Q19 analysis in Spark2.3 without L486 & 487.pdf) > Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS > in 3TB scale > -- > > Key: SPARK-26155 > URL: https://issues.apache.org/jira/browse/SPARK-26155 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Ke Jia >Priority: Major > Attachments: Q19 analysis in Spark2.3 with L486&487.pdf, Q19 analysis > in Spark2.3 without L486&487.pdf, q19.sql > > > In our test environment, we found a serious performance degradation issue in > Spark2.3 when running TPC-DS on SKX 8180. Several queries have serious > performance degradation. For example, TPC-DS Q19 needs 126 seconds with Spark > 2.3 while it needs only 29 seconds with Spark2.1 on 3TB data. We investigated > this problem and figured out the root cause is in community patch SPARK-21052 > which add metrics to hash join process. And the impact code is > [L486|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L486] > and > [L487|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L487] > . Q19 costs about 30 seconds without these two lines code and 126 seconds > with these code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26155) Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB scale
[ https://issues.apache.org/jira/browse/SPARK-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated SPARK-26155: --- Attachment: Q19 analysis in Spark2.3 without L486&487.pdf > Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS > in 3TB scale > -- > > Key: SPARK-26155 > URL: https://issues.apache.org/jira/browse/SPARK-26155 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Ke Jia >Priority: Major > Attachments: Q19 analysis in Spark2.3 with L486&487.pdf, Q19 analysis > in Spark2.3 without L486&487.pdf, q19.sql > > > In our test environment, we found a serious performance degradation issue in > Spark2.3 when running TPC-DS on SKX 8180. Several queries have serious > performance degradation. For example, TPC-DS Q19 needs 126 seconds with Spark > 2.3 while it needs only 29 seconds with Spark2.1 on 3TB data. We investigated > this problem and figured out the root cause is in community patch SPARK-21052 > which add metrics to hash join process. And the impact code is > [L486|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L486] > and > [L487|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L487] > . Q19 costs about 30 seconds without these two lines code and 126 seconds > with these code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26196) Total tasks message in the stage in incorrect, when there are failed or killed tasks
[ https://issues.apache.org/jira/browse/SPARK-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701294#comment-16701294 ] Apache Spark commented on SPARK-26196: -- User 'shahidki31' has created a pull request for this issue: https://github.com/apache/spark/pull/23160 > Total tasks message in the stage in incorrect, when there are failed or > killed tasks > > > Key: SPARK-26196 > URL: https://issues.apache.org/jira/browse/SPARK-26196 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0 >Reporter: shahid >Priority: Major > > Total tasks in the stage page is incorrect when there are failed or killed > tasks. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26196) Total tasks message in the stage in incorrect, when there are failed or killed tasks
[ https://issues.apache.org/jira/browse/SPARK-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26196: Assignee: Apache Spark > Total tasks message in the stage in incorrect, when there are failed or > killed tasks > > > Key: SPARK-26196 > URL: https://issues.apache.org/jira/browse/SPARK-26196 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0 >Reporter: shahid >Assignee: Apache Spark >Priority: Major > > Total tasks in the stage page is incorrect when there are failed or killed > tasks. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26196) Total tasks message in the stage in incorrect, when there are failed or killed tasks
[ https://issues.apache.org/jira/browse/SPARK-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26196: Assignee: (was: Apache Spark) > Total tasks message in the stage in incorrect, when there are failed or > killed tasks > > > Key: SPARK-26196 > URL: https://issues.apache.org/jira/browse/SPARK-26196 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0 >Reporter: shahid >Priority: Major > > Total tasks in the stage page is incorrect when there are failed or killed > tasks. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26195) Correct exception messages in some classes
[ https://issues.apache.org/jira/browse/SPARK-26195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701279#comment-16701279 ] Apache Spark commented on SPARK-26195: -- User 'lcqzte10192193' has created a pull request for this issue: https://github.com/apache/spark/pull/23154 > Correct exception messages in some classes > -- > > Key: SPARK-26195 > URL: https://issues.apache.org/jira/browse/SPARK-26195 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: lichaoqun >Priority: Minor > > UnsupportedOperationException messages are not the same with method name.This > PR correct these messages. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26196) Total tasks message in the stage in incorrect, when there are failed or killed tasks
shahid created SPARK-26196: -- Summary: Total tasks message in the stage in incorrect, when there are failed or killed tasks Key: SPARK-26196 URL: https://issues.apache.org/jira/browse/SPARK-26196 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 3.0.0 Reporter: shahid Total tasks in the stage page is incorrect when there are failed or killed tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26196) Total tasks message in the stage in incorrect, when there are failed or killed tasks
[ https://issues.apache.org/jira/browse/SPARK-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701285#comment-16701285 ] shahid commented on SPARK-26196: I will raise a PR > Total tasks message in the stage in incorrect, when there are failed or > killed tasks > > > Key: SPARK-26196 > URL: https://issues.apache.org/jira/browse/SPARK-26196 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0 >Reporter: shahid >Priority: Major > > Total tasks in the stage page is incorrect when there are failed or killed > tasks. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26195) Correct exception messages in some classes
[ https://issues.apache.org/jira/browse/SPARK-26195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701281#comment-16701281 ] Apache Spark commented on SPARK-26195: -- User 'lcqzte10192193' has created a pull request for this issue: https://github.com/apache/spark/pull/23154 > Correct exception messages in some classes > -- > > Key: SPARK-26195 > URL: https://issues.apache.org/jira/browse/SPARK-26195 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: lichaoqun >Priority: Minor > > UnsupportedOperationException messages are not the same with method name.This > PR correct these messages. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26195) Correct exception messages in some classes
[ https://issues.apache.org/jira/browse/SPARK-26195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26195: Assignee: (was: Apache Spark) > Correct exception messages in some classes > -- > > Key: SPARK-26195 > URL: https://issues.apache.org/jira/browse/SPARK-26195 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: lichaoqun >Priority: Minor > > UnsupportedOperationException messages are not the same with method name.This > PR correct these messages. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26195) Correct exception messages in some classes
[ https://issues.apache.org/jira/browse/SPARK-26195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26195: Assignee: Apache Spark > Correct exception messages in some classes > -- > > Key: SPARK-26195 > URL: https://issues.apache.org/jira/browse/SPARK-26195 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: lichaoqun >Assignee: Apache Spark >Priority: Minor > > UnsupportedOperationException messages are not the same with method name.This > PR correct these messages. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26195) Correct exception messages in some classes
lichaoqun created SPARK-26195: - Summary: Correct exception messages in some classes Key: SPARK-26195 URL: https://issues.apache.org/jira/browse/SPARK-26195 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: lichaoqun UnsupportedOperationException messages are not the same with method name.This PR correct these messages. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10816) EventTime based sessionization
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701275#comment-16701275 ] Yuanjian Li commented on SPARK-10816: - Thanks for the fix and benchmark by [~ivoson], fix commit has been merged into [https://github.com/apache/spark/pull/22583.] [~kabhwan] Is there any possible to combine our proposal together and fix this issue? I think benchmark currently are flatted and hope we can solve this problem together. > EventTime based sessionization > -- > > Key: SPARK-10816 > URL: https://issues.apache.org/jira/browse/SPARK-10816 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Reporter: Reynold Xin >Priority: Major > Attachments: SPARK-10816 Support session window natively.pdf, Session > Window Support For Structure Streaming.pdf > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26194) Support automatic spark.authenticate secret in Kubernetes backend
Marcelo Vanzin created SPARK-26194: -- Summary: Support automatic spark.authenticate secret in Kubernetes backend Key: SPARK-26194 URL: https://issues.apache.org/jira/browse/SPARK-26194 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.0.0 Reporter: Marcelo Vanzin Currently k8s inherits the default behavior for {{spark.authenticate}}, which is that the user must provide an auth secret. k8s doesn't have that requirement and could instead generate its own unique per-app secret, and propagate it to executors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24219) Improve the docker build script to avoid copying everything in example
[ https://issues.apache.org/jira/browse/SPARK-24219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-24219. Resolution: Duplicate I fixed this as part of SPARK-26025. > Improve the docker build script to avoid copying everything in example > -- > > Key: SPARK-24219 > URL: https://issues.apache.org/jira/browse/SPARK-24219 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.3.0 >Reporter: Saisai Shao >Priority: Minor > > Current docker build script will copy everything under example folder to > docker image if it is invoked in dev path, this unnecessarily copies too many > files like building temporary files into the docker image. So here propose to > improve the script. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24383) spark on k8s: "driver-svc" are not getting deleted
[ https://issues.apache.org/jira/browse/SPARK-24383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-24383. Resolution: Not A Problem This has been working reliably for me. If your k8s server is not gc'ing old state, then it's probably an issue with your server. > spark on k8s: "driver-svc" are not getting deleted > -- > > Key: SPARK-24383 > URL: https://issues.apache.org/jira/browse/SPARK-24383 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.3.0 >Reporter: Lenin >Priority: Major > > When the driver pod exists, the "*driver-svc" services created for the driver > are not cleaned up. This causes accumulation of services in the k8s layer, at > one point no more services can be created. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24577) Spark submit fails with documentation example spark-pi
[ https://issues.apache.org/jira/browse/SPARK-24577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-24577. Resolution: Duplicate > Spark submit fails with documentation example spark-pi > -- > > Key: SPARK-24577 > URL: https://issues.apache.org/jira/browse/SPARK-24577 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.0, 2.3.1 >Reporter: Kuku1 >Priority: Major > > The Spark-submit example in the [K8s > documentation|http://spark.apache.org/docs/latest/running-on-kubernetes.html#cluster-mode] > fails for me. > {code:java} > .\spark-submit.cmd --master k8s://https://my-k8s:8443 > --conf spark.kubernetes.namespace=my-namespace --deploy-mode cluster --name > spark-pi --class org.apache.spark.examples.SparkPi > --conf spark.executor.instances=5 > --conf spark.kubernetes.container.image=gcr.io/ynli-k8s/spark:v2.3.0 > --conf spark.kubernetes.driver.pod.name=spark-pi-driver > local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar > {code} > Error in the driver log: > {code:java} > ++ id -u > + myuid=0 > ++ id -g > + mygid=0 > ++ getent passwd 0 > + uidentry=root:x:0:0:root:/root:/bin/ash > + '[' -z root:x:0:0:root:/root:/bin/ash ']' > + SPARK_K8S_CMD=driver > + '[' -z driver ']' > + shift 1 > + SPARK_CLASSPATH=':/opt/spark/jars/*' > + env > + grep SPARK_JAVA_OPT_ > + sed 's/[^=]*=\(.*\)/\1/g' > + readarray -t SPARK_JAVA_OPTS > + '[' -n > '/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar;/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar' > ']' > + > SPARK_CLASSPATH=':/opt/spark/jars/*:/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar;/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar' > + '[' -n '' ']' > + case "$SPARK_K8S_CMD" in > + CMD=(${JAVA_HOME}/bin/java "${SPARK_JAVA_OPTS[@]}" -cp "$SPARK_CLASSPATH" > -Xms$SPARK_DRIVER_MEMORY -Xmx$SPARK_DRIVER_MEMORY > -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS $SPARK_DRIVER_CLASS > $SPARK_DRIVER_ARGS) > + exec /sbin/tini -s -- /usr/lib/jvm/java-1.8-openjdk/bin/java > -Dspark.kubernetes.namespace=my-namespace -Dspark.driver.port=7078 > -Dspark.master=k8s://https://my-k8s:8443 > -Dspark.jars=/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar,/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar > -Dspark.driver.blockManager.port=7079 > -Dspark.app.id=spark-311b7351345240fd89d6d86eaabdff6f > -Dspark.kubernetes.driver.pod.name=spark-pi-driver > -Dspark.executor.instances=5 -Dspark.app.name=spark-pi > -Dspark.driver.host=spark-pi-ef6be7cac60a3f789f9714b2ebd1c68c-driver-svc.my-namespace.svc > -Dspark.submit.deployMode=cluster > -Dspark.kubernetes.executor.podNamePrefix=spark-pi-ef6be7cac60a3f789f9714b2ebd1c68c > -Dspark.kubernetes.container.image=gcr.io/ynli-k8s/spark:v2.3.0 -cp > ':/opt/spark/jars/*:/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar;/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar' > -Xms1g -Xmx1g -Dspark.driver.bindAddress=172.101.1.40 > org.apache.spark.examples.SparkPi > Error: Could not find or load main class org.apache.spark.examples.SparkPi > {code} > I am also using spark-operator to run the example and this one works for me. > The spark-operator outputs its command to spark-submit: > > {code:java} > ++ id -u > + myuid=0 > ++ id -g > + mygid=0 > ++ getent passwd 0 > + uidentry=root:x:0:0:root:/root:/bin/ash > + '[' -z root:x:0:0:root:/root:/bin/ash ']' > + SPARK_K8S_CMD=driver > + '[' -z driver ']' > + shift 1 > + SPARK_CLASSPATH=':/opt/spark/jars/*' > + env > + grep SPARK_JAVA_OPT_ > + sed 's/[^=]*=\(.*\)/\1/g' > + readarray -t SPARK_JAVA_OPTS > + '[' -n > /opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar:/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar > ']' > + > SPARK_CLASSPATH=':/opt/spark/jars/*:/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar:/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar' > + '[' -n '' ']' > + case "$SPARK_K8S_CMD" in > + CMD=(${JAVA_HOME}/bin/java "${SPARK_JAVA_OPTS[@]}" -cp "$SPARK_CLASSPATH" > -Xms$SPARK_DRIVER_MEMORY -Xmx$SPARK_DRIVER_MEMORY > -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS $SPARK_DRIVER_CLASS > $SPARK_DRIVER_ARGS) > + exec /sbin/tini -s -- /usr/lib/jvm/java-1.8-openjdk/bin/java > -Dspark.kubernetes.driver.label.sparkoperator.k8s.io/app-id=spark-pi-2557211557 > -Dspark.kubernetes.container.image=gcr.io/ynli-k8s/spark:v2.3.0 > -Dspark.kubernetes.executor.label.sparkoperator.k8s.io/app-name=spark-pi > -Dspark.app.name=spark-pi > -Dspark.executor.instances=7 > -Dspark.driver.blockManager.port=7079 > -Dspark.driver.cores=0.10 > -Dspark.kubernetes.driver.label.version=2.3.0 > -Dspark.kubernetes.executor.podNamePrefix=spark-pi-607e0943cf32319883cc3beb2e02be4f > -Dspark.executor.memory=512m > -Dspark.kubernetes.driver.label.
[jira] [Resolved] (SPARK-24600) Improve support for building different types of images in dockerfile
[ https://issues.apache.org/jira/browse/SPARK-24600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-24600. Resolution: Duplicate > Improve support for building different types of images in dockerfile > > > Key: SPARK-24600 > URL: https://issues.apache.org/jira/browse/SPARK-24600 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Anirudh Ramanathan >Priority: Major > > Our docker images currently build and push docker images for pyspark and > java/scala. > We should be able to build/push either one of them. In the future, we'll have > this extended to sparkR, the shuffle service, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26096) k8s integration tests should run R tests
[ https://issues.apache.org/jira/browse/SPARK-26096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-26096. Resolution: Duplicate > k8s integration tests should run R tests > > > Key: SPARK-26096 > URL: https://issues.apache.org/jira/browse/SPARK-26096 > Project: Spark > Issue Type: Task > Components: Kubernetes, Tests >Affects Versions: 3.0.0 >Reporter: Marcelo Vanzin >Priority: Major > > Noticed while debugging a completely separate things. > - the jenkins job doesn't enable the SparkR profile > - KubernetesSuite doesn't include the RTestsSuite trait > even if you fix those two, it seems the tests are broken: > {noformat} > [info] - Run SparkR on simple dataframe.R example *** FAILED *** (2 minutes, > 3 seconds) > [info] at > org.scalatest.concurrent.Eventually.eventually(Eventually.scala:308) > [info] at > org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:307) > [info] at > org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:479) > [info] at > org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.runSparkApplicationAndVerifyCompletion(KubernetesSuite.scala:274) > [info] at > org.apache.spark.deploy.k8s.integrationtest.RTestsSuite.$anonfun$$init$$1(RTestsSuite.scala:26) > [info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26125) Delegation Token seems not appropriately stored on secrets of Kubernetes/Kerberized HDFS
[ https://issues.apache.org/jira/browse/SPARK-26125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701200#comment-16701200 ] Marcelo Vanzin commented on SPARK-26125: Pretty sure this works with my patch for SPARK-25815, but since this is a different bug from that one, will keep it separate (and close together). > Delegation Token seems not appropriately stored on secrets of > Kubernetes/Kerberized HDFS > > > Key: SPARK-26125 > URL: https://issues.apache.org/jira/browse/SPARK-26125 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Kei Kori >Priority: Minor > Attachments: spark-submit-stern.log > > > I tried Kerberos authentication with Kubernetes Resource Manager and an > external Hadoop and KDC. > I tested built on > [6c9c84f|https://github.com/apache/spark/commit/6c9c84ffb9c8d98ee2ece7ba4b010856591d383d] > (master + SPARK-23257). > {code} > $ bin/spark-submit \ > --deploy-mode cluster \ > --class org.apache.spark.examples.HdfsTest \ > --master k8s://https://master01.node:6443 \ > --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ > --conf spark.app.name=spark-hdfs \ > --conf spark.executer.instances=1 \ > --conf > spark.kubernetes.container.image=docker-registry/kkori/spark:6c9c84f \ > --conf spark.kubernetes.kerberos.enabled=true \ > --conf spark.kubernetes.kerberos.krb5.configMapName=krb5-conf \ > --conf spark.kubernetes.kerberos.keytab=/tmp/test.keytab \ > --conf > spark.kubernetes.kerberos.principal=t...@external.kerberos.realm.com \ > --conf spark.kubernetes.hadoop.configMapName=hadoop-conf \ > local:///opt/spark/examples/jars/spark-examples_2.11-3.0.0-SNAPSHOT.jar > {code} > I successfully submitted into Kubernetes RM and Kubernetes spawned > spark-driver and executors, > but Hadoop Delegation Token seems wrongly stored into Kubernetes secrets, > since that contains only header like below: > {code} > $ kubectl get secrets spark-hdfs-1542613661459-delegation-tokens -o > jsonpath='{.data.hadoop-tokens}' | {base64 -d | cat -A; echo;} > HDTS^@^@^@ > {code} > The result of "kubectl get secrets" should be like folloing(I masked the > actual result): > {code} > HDTS^@^ha-hdfs:test^@^_t...@external.kerberos.realm.com^@^@ > {code} > As a result, spark-driver threw GSSException for each access of HDFS. > Full logs(submit, driver, executor) are attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25744) Allow kubernetes integration tests to be run against a real cluster.
[ https://issues.apache.org/jira/browse/SPARK-25744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-25744. Resolution: Duplicate > Allow kubernetes integration tests to be run against a real cluster. > > > Key: SPARK-25744 > URL: https://issues.apache.org/jira/browse/SPARK-25744 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Prashant Sharma >Priority: Minor > > Currently, tests can only run against a minikube cluster, testing against a > real cluster gives more flexibility in writing tests with more number of > executors and resources. > It will also be helpful, if minikube is unavailable for testing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26064) Unable to fetch jar from remote repo while running spark-submit on kubernetes
[ https://issues.apache.org/jira/browse/SPARK-26064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-26064. Resolution: Invalid I'm closing this for the time being. If you have a question please use the mailing lists. If you're reporting an issue, please provide more information (like the actual error). > Unable to fetch jar from remote repo while running spark-submit on kubernetes > - > > Key: SPARK-26064 > URL: https://issues.apache.org/jira/browse/SPARK-26064 > Project: Spark > Issue Type: Question > Components: Kubernetes >Affects Versions: 2.3.2 >Reporter: Bala Bharath Reddy Resapu >Priority: Major > > I am trying to run spark on kubernetes with a docker image. My requirement is > to download the jar from the external repo while running spark-submit. I am > able to download the jar using wget in the container but it doesn't work when > inputting in the spark-submit command. I am not packaging the jar with docker > image. It works fine when I input the jar file inside the docker image. > > ./bin/spark-submit \ > --master k8s://[https://ip:port|https://ipport/] \ > --deploy-mode cluster \ > --name test3 \ > --class hello \ > --conf spark.kubernetes.container.image.pullSecrets=abcd \ > --conf spark.kubernetes.container.image=spark:h2.0 \ > [https://devops.com/artifactory/local/testing/testing_2.11/h|https://bala.bharath.reddy.resapu%40ibm.com:akcp5bcbktykg2ti28sju4gtebsqwkg2mqkaf9w6g5rdbo3iwrwx7qb1m5dokgd54hdru2...@na.artifactory.swg-devops.com/artifactory/txo-cedp-garage-artifacts-sbt-local/testing/testing_2.11/arithmetic.jar]ello.jar -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26193) Implement shuffle write metrics in SQL
Xiao Li created SPARK-26193: --- Summary: Implement shuffle write metrics in SQL Key: SPARK-26193 URL: https://issues.apache.org/jira/browse/SPARK-26193 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Xiao Li Assignee: Yuanjian Li -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26191) Control number of truncated fields
[ https://issues.apache.org/jira/browse/SPARK-26191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26191: Assignee: Apache Spark > Control number of truncated fields > -- > > Key: SPARK-26191 > URL: https://issues.apache.org/jira/browse/SPARK-26191 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Minor > > Currently, the threshold for truncated fields converted to string can be > controlled via global SQL config. Need to add the maxFields parameter to all > functions/methods that potentially could produce truncated string from a > sequence of fields. > One of use cases is toFile. This method aims to output not truncated plans. > For now users has to set global config to flush whole plans. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26190) SparkLauncher: Allow users to set their own submitter script instead of hardcoded spark-submit
[ https://issues.apache.org/jira/browse/SPARK-26190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-26190. Resolution: Won't Fix I'm closing this for now until I see a better use case. Seems to be you can easily do this if you want to without needing changes in Spark. > SparkLauncher: Allow users to set their own submitter script instead of > hardcoded spark-submit > -- > > Key: SPARK-26190 > URL: https://issues.apache.org/jira/browse/SPARK-26190 > Project: Spark > Issue Type: Improvement > Components: Java API, Spark Core, Spark Submit >Affects Versions: 2.1.0 > Environment: Apache Spark 2.0.1 on yarn cluster (MapR distribution) >Reporter: Gyanendra Dwivedi >Priority: Major > > The improvement request is for improvement in the SparkLauncher class which > is responsible to execute builtin spark-submit script using Java API. > In my use case, there is a custom wrapper script which help in integrating > the security features while submitting the spark job using builtin > spark-submit. > Currently the script name is hard-coded in the 'createBuilder()' method of > org.apache.spark.launcher.SparkLauncher class: > {code:java} > // code placeholder > private ProcessBuilder createBuilder() { > List cmd = new ArrayList(); > String script = CommandBuilderUtils.isWindows() ? "spark-submit.cmd" : > "spark-submit"; > cmd.add(CommandBuilderUtils.join(File.separator, new > String[]{this.builder.getSparkHome(), "bin", script})); > cmd.addAll(this.builder.buildSparkSubmitArgs()); > .. > .. > }{code} > > > It has following issues, which prevents its usage in certain scenario. > 1) Developer may not use their own custom scripts with different name. They > are forced to use the one shipped with the installation. Overwriting that may > not be the option, when it is not allowed to alter the original installation. > 2) The code expect the script to be present at "SPARK_HOME/bin" folder. > 3) The 'createBuilder()' method is private and hence, extending the > 'org.apache.spark.launcher.SparkLauncher' is not an option. > > Proposed solution: > 1) Developer should be given an optional parameter to set their own custom > script, which may be located at any path. > 2) Only in case the parameter is not set, the default spark-submit script > should be taken from SPARK_HOME/bin folder. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26190) SparkLauncher: Allow users to set their own submitter script instead of hardcoded spark-submit
[ https://issues.apache.org/jira/browse/SPARK-26190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701140#comment-16701140 ] Marcelo Vanzin commented on SPARK-26190: bq. Can you give me one good reason that why you must have the script name hard-coded in the SparkLauncher? Because that is the Spark public interface. You run Spark using spark-submit, and you customize spark-submit using spark-env.sh. If that does not work for you, you'll need a little more justification than "I want to run my own script". You have a bunch of options, including patching spark-submit when deploying in your environment, that don't require changes in Spark at all. > SparkLauncher: Allow users to set their own submitter script instead of > hardcoded spark-submit > -- > > Key: SPARK-26190 > URL: https://issues.apache.org/jira/browse/SPARK-26190 > Project: Spark > Issue Type: Improvement > Components: Java API, Spark Core, Spark Submit >Affects Versions: 2.1.0 > Environment: Apache Spark 2.0.1 on yarn cluster (MapR distribution) >Reporter: Gyanendra Dwivedi >Priority: Major > > The improvement request is for improvement in the SparkLauncher class which > is responsible to execute builtin spark-submit script using Java API. > In my use case, there is a custom wrapper script which help in integrating > the security features while submitting the spark job using builtin > spark-submit. > Currently the script name is hard-coded in the 'createBuilder()' method of > org.apache.spark.launcher.SparkLauncher class: > {code:java} > // code placeholder > private ProcessBuilder createBuilder() { > List cmd = new ArrayList(); > String script = CommandBuilderUtils.isWindows() ? "spark-submit.cmd" : > "spark-submit"; > cmd.add(CommandBuilderUtils.join(File.separator, new > String[]{this.builder.getSparkHome(), "bin", script})); > cmd.addAll(this.builder.buildSparkSubmitArgs()); > .. > .. > }{code} > > > It has following issues, which prevents its usage in certain scenario. > 1) Developer may not use their own custom scripts with different name. They > are forced to use the one shipped with the installation. Overwriting that may > not be the option, when it is not allowed to alter the original installation. > 2) The code expect the script to be present at "SPARK_HOME/bin" folder. > 3) The 'createBuilder()' method is private and hence, extending the > 'org.apache.spark.launcher.SparkLauncher' is not an option. > > Proposed solution: > 1) Developer should be given an optional parameter to set their own custom > script, which may be located at any path. > 2) Only in case the parameter is not set, the default spark-submit script > should be taken from SPARK_HOME/bin folder. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26191) Control number of truncated fields
[ https://issues.apache.org/jira/browse/SPARK-26191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701139#comment-16701139 ] Apache Spark commented on SPARK-26191: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/23159 > Control number of truncated fields > -- > > Key: SPARK-26191 > URL: https://issues.apache.org/jira/browse/SPARK-26191 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Priority: Minor > > Currently, the threshold for truncated fields converted to string can be > controlled via global SQL config. Need to add the maxFields parameter to all > functions/methods that potentially could produce truncated string from a > sequence of fields. > One of use cases is toFile. This method aims to output not truncated plans. > For now users has to set global config to flush whole plans. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26191) Control number of truncated fields
[ https://issues.apache.org/jira/browse/SPARK-26191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26191: Assignee: (was: Apache Spark) > Control number of truncated fields > -- > > Key: SPARK-26191 > URL: https://issues.apache.org/jira/browse/SPARK-26191 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Priority: Minor > > Currently, the threshold for truncated fields converted to string can be > controlled via global SQL config. Need to add the maxFields parameter to all > functions/methods that potentially could produce truncated string from a > sequence of fields. > One of use cases is toFile. This method aims to output not truncated plans. > For now users has to set global config to flush whole plans. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26192) MesosClusterScheduler reads options from dispatcher conf instead of submission conf
Martin Loncaric created SPARK-26192: --- Summary: MesosClusterScheduler reads options from dispatcher conf instead of submission conf Key: SPARK-26192 URL: https://issues.apache.org/jira/browse/SPARK-26192 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 2.4.0, 2.3.2, 2.3.1, 2.3.0 Reporter: Martin Loncaric There are at least two options accessed in MesosClusterScheduler that should come from the submission's configuration instead of the dispatcher's: spark.app.name spark.mesos.fetchCache.enable This means that all Mesos tasks for Spark drivers have uninformative names of the form "Driver for (MainClass)" rather than the configured application name, and Spark drivers never cache files. Coincidentally, the spark.mesos.fetchCache.enable option is misnamed, as referenced in the linked JIRA. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26190) SparkLauncher: Allow users to set their own submitter script instead of hardcoded spark-submit
[ https://issues.apache.org/jira/browse/SPARK-26190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701136#comment-16701136 ] Gyanendra Dwivedi commented on SPARK-26190: --- [~vanzin] I am sorry I am not able to explain you a "real" enterprise level limitations for developers. I cannot expose more on why a custom script is the only option !! Can you give me one good reason that why you must have the script name hard-coded in the SparkLauncher? Why SparkLauncher should expect the script name "spark-submit" for non-window OS in a path SPARK_HOME/bin only? I am not willing to invest my time justifying any more, feel free to close or whatever. This thing was for an Spark improvement from a developer's real time challenge. Anyway by the time it comes back to me (if someone fixes it); its too late for my bus. I will just patch it and move on, as I do with most of the poorly written APIs. > SparkLauncher: Allow users to set their own submitter script instead of > hardcoded spark-submit > -- > > Key: SPARK-26190 > URL: https://issues.apache.org/jira/browse/SPARK-26190 > Project: Spark > Issue Type: Improvement > Components: Java API, Spark Core, Spark Submit >Affects Versions: 2.1.0 > Environment: Apache Spark 2.0.1 on yarn cluster (MapR distribution) >Reporter: Gyanendra Dwivedi >Priority: Major > > The improvement request is for improvement in the SparkLauncher class which > is responsible to execute builtin spark-submit script using Java API. > In my use case, there is a custom wrapper script which help in integrating > the security features while submitting the spark job using builtin > spark-submit. > Currently the script name is hard-coded in the 'createBuilder()' method of > org.apache.spark.launcher.SparkLauncher class: > {code:java} > // code placeholder > private ProcessBuilder createBuilder() { > List cmd = new ArrayList(); > String script = CommandBuilderUtils.isWindows() ? "spark-submit.cmd" : > "spark-submit"; > cmd.add(CommandBuilderUtils.join(File.separator, new > String[]{this.builder.getSparkHome(), "bin", script})); > cmd.addAll(this.builder.buildSparkSubmitArgs()); > .. > .. > }{code} > > > It has following issues, which prevents its usage in certain scenario. > 1) Developer may not use their own custom scripts with different name. They > are forced to use the one shipped with the installation. Overwriting that may > not be the option, when it is not allowed to alter the original installation. > 2) The code expect the script to be present at "SPARK_HOME/bin" folder. > 3) The 'createBuilder()' method is private and hence, extending the > 'org.apache.spark.launcher.SparkLauncher' is not an option. > > Proposed solution: > 1) Developer should be given an optional parameter to set their own custom > script, which may be located at any path. > 2) Only in case the parameter is not set, the default spark-submit script > should be taken from SPARK_HOME/bin folder. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26191) Control number of truncated fields
Maxim Gekk created SPARK-26191: -- Summary: Control number of truncated fields Key: SPARK-26191 URL: https://issues.apache.org/jira/browse/SPARK-26191 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.4.0 Reporter: Maxim Gekk Currently, the threshold for truncated fields converted to string can be controlled via global SQL config. Need to add the maxFields parameter to all functions/methods that potentially could produce truncated string from a sequence of fields. One of use cases is toFile. This method aims to output not truncated plans. For now users has to set global config to flush whole plans. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26189) Fix the doc of unionAll in SparkR
[ https://issues.apache.org/jira/browse/SPARK-26189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-26189: -- Priority: Minor (was: Major) > Fix the doc of unionAll in SparkR > - > > Key: SPARK-26189 > URL: https://issues.apache.org/jira/browse/SPARK-26189 > Project: Spark > Issue Type: Documentation > Components: R >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Minor > > We should fix the doc of unionAll in SparkR. See the discussion: > https://github.com/apache/spark/pull/23131/files#r236760822 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26190) SparkLauncher: Allow users to set their own submitter script instead of hardcoded spark-submit
[ https://issues.apache.org/jira/browse/SPARK-26190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700937#comment-16700937 ] Marcelo Vanzin commented on SPARK-26190: If you need to run things before spark-submit runs, considering writing your own spark-env.sh that does what you need. That's a supported feature of Spark. Sorry but I still don't think it's a good idea to provide the functionality you're asking for. > SparkLauncher: Allow users to set their own submitter script instead of > hardcoded spark-submit > -- > > Key: SPARK-26190 > URL: https://issues.apache.org/jira/browse/SPARK-26190 > Project: Spark > Issue Type: Improvement > Components: Java API, Spark Core, Spark Submit >Affects Versions: 2.1.0 > Environment: Apache Spark 2.0.1 on yarn cluster (MapR distribution) >Reporter: Gyanendra Dwivedi >Priority: Major > > The improvement request is for improvement in the SparkLauncher class which > is responsible to execute builtin spark-submit script using Java API. > In my use case, there is a custom wrapper script which help in integrating > the security features while submitting the spark job using builtin > spark-submit. > Currently the script name is hard-coded in the 'createBuilder()' method of > org.apache.spark.launcher.SparkLauncher class: > {code:java} > // code placeholder > private ProcessBuilder createBuilder() { > List cmd = new ArrayList(); > String script = CommandBuilderUtils.isWindows() ? "spark-submit.cmd" : > "spark-submit"; > cmd.add(CommandBuilderUtils.join(File.separator, new > String[]{this.builder.getSparkHome(), "bin", script})); > cmd.addAll(this.builder.buildSparkSubmitArgs()); > .. > .. > }{code} > > > It has following issues, which prevents its usage in certain scenario. > 1) Developer may not use their own custom scripts with different name. They > are forced to use the one shipped with the installation. Overwriting that may > not be the option, when it is not allowed to alter the original installation. > 2) The code expect the script to be present at "SPARK_HOME/bin" folder. > 3) The 'createBuilder()' method is private and hence, extending the > 'org.apache.spark.launcher.SparkLauncher' is not an option. > > Proposed solution: > 1) Developer should be given an optional parameter to set their own custom > script, which may be located at any path. > 2) Only in case the parameter is not set, the default spark-submit script > should be taken from SPARK_HOME/bin folder. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26190) SparkLauncher: Allow users to set their own submitter script instead of hardcoded spark-submit
[ https://issues.apache.org/jira/browse/SPARK-26190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyanendra Dwivedi updated SPARK-26190: -- Description: The improvement request is for improvement in the SparkLauncher class which is responsible to execute builtin spark-submit script using Java API. In my use case, there is a custom wrapper script which help in integrating the security features while submitting the spark job using builtin spark-submit. Currently the script name is hard-coded in the 'createBuilder()' method of org.apache.spark.launcher.SparkLauncher class: {code:java} // code placeholder private ProcessBuilder createBuilder() { List cmd = new ArrayList(); String script = CommandBuilderUtils.isWindows() ? "spark-submit.cmd" : "spark-submit"; cmd.add(CommandBuilderUtils.join(File.separator, new String[]{this.builder.getSparkHome(), "bin", script})); cmd.addAll(this.builder.buildSparkSubmitArgs()); .. .. }{code} It has following issues, which prevents its usage in certain scenario. 1) Developer may not use their own custom scripts with different name. They are forced to use the one shipped with the installation. Overwriting that may not be the option, when it is not allowed to alter the original installation. 2) The code expect the script to be present at "SPARK_HOME/bin" folder. 3) The 'createBuilder()' method is private and hence, extending the 'org.apache.spark.launcher.SparkLauncher' is not an option. Proposed solution: 1) Developer should be given an optional parameter to set their own custom script, which may be located at any path. 2) Only in case the parameter is not set, the default spark-submit script should be taken from SPARK_HOME/bin folder. was: Currently the script name is hard-coded in the 'createBuilder()' method of org.apache.spark.launcher.SparkLauncher class: {code:java} // code placeholder private ProcessBuilder createBuilder() { List cmd = new ArrayList(); String script = CommandBuilderUtils.isWindows() ? "spark-submit.cmd" : "spark-submit"; cmd.add(CommandBuilderUtils.join(File.separator, new String[]{this.builder.getSparkHome(), "bin", script})); cmd.addAll(this.builder.buildSparkSubmitArgs()); .. .. }{code} It has following issues, which prevents its usage in certain scenario. 1) Developer may not use their own custom scripts with different name. They are forced to use the one shipped with the installation. Overwriting that may not be the option, when it is not allowed to alter the original installation. 2) The code expect the script to be present at "SPARK_HOME/bin" folder. 3) The 'createBuilder()' method is private and hence, extending the 'org.apache.spark.launcher.SparkLauncher' is not an option. Proposed solution: 1) Developer should be given an optional parameter to set their own custom script, which may be located at any path. 2) Only in case the parameter is not set, the default spark-submit script should be taken from SPARK_HOME/bin folder. > SparkLauncher: Allow users to set their own submitter script instead of > hardcoded spark-submit > -- > > Key: SPARK-26190 > URL: https://issues.apache.org/jira/browse/SPARK-26190 > Project: Spark > Issue Type: Improvement > Components: Java API, Spark Core, Spark Submit >Affects Versions: 2.1.0 > Environment: Apache Spark 2.0.1 on yarn cluster (MapR distribution) >Reporter: Gyanendra Dwivedi >Priority: Major > > The improvement request is for improvement in the SparkLauncher class which > is responsible to execute builtin spark-submit script using Java API. > In my use case, there is a custom wrapper script which help in integrating > the security features while submitting the spark job using builtin > spark-submit. > Currently the script name is hard-coded in the 'createBuilder()' method of > org.apache.spark.launcher.SparkLauncher class: > {code:java} > // code placeholder > private ProcessBuilder createBuilder() { > List cmd = new ArrayList(); > String script = CommandBuilderUtils.isWindows() ? "spark-submit.cmd" : > "spark-submit"; > cmd.add(CommandBuilderUtils.join(File.separator, new > String[]{this.builder.getSparkHome(), "bin", script})); > cmd.addAll(this.builder.buildSparkSubmitArgs()); > .. > .. > }{code} > > > It has following issues, which prevents its usage in certain scenario. > 1) Developer may not use their own custom scripts with different name. They > are forced to use the one shipped with the installation. Overwriting that may > not be the option, when it is not allowed to alter the original installation. > 2) The code expect the script to be present at "SPARK_HOME/bin" folder. > 3) The 'createBuilde
[jira] [Comment Edited] (SPARK-26190) SparkLauncher: Allow users to set their own submitter script instead of hardcoded spark-submit
[ https://issues.apache.org/jira/browse/SPARK-26190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700909#comment-16700909 ] Gyanendra Dwivedi edited comment on SPARK-26190 at 11/27/18 7:51 PM: - [~vanzin] If it was so easy to execute any custom script using {{Runtime.getRuntime().exec(); then why does sparkLauncher exist to execute builtin spark-submit script?}} Creating fake symlinks etc is not a viable solution for production servers where installation location just may change with new version etc. Creating a symlink like adhoc solution should not be a reason for closing this feature request. Don't know why it was never thought to keep things configurable. Hard coding, assuming a specific environment/use case setup and forcing developers to look for work around should not be encouraged. was (Author: gm_dwivedi): [~vanzin] If it was so easy to run or any custom script using {{Runtime.getRuntime().exec(); then why does sparkLauncher exist to execute builtin spark-submit script?}} Creating fake symlinks etc is not a viable solution for production servers where installation location just may change with new version etc. Creating a symlink like adhoc solution should not be a reason for closing this feature request. Don't know why it was never thought to keep things configurable. Hard coding, assuming a specific environment/use case setup and forcing developers to look for work around should not be encouraged. > SparkLauncher: Allow users to set their own submitter script instead of > hardcoded spark-submit > -- > > Key: SPARK-26190 > URL: https://issues.apache.org/jira/browse/SPARK-26190 > Project: Spark > Issue Type: Improvement > Components: Java API, Spark Core, Spark Submit >Affects Versions: 2.1.0 > Environment: Apache Spark 2.0.1 on yarn cluster (MapR distribution) >Reporter: Gyanendra Dwivedi >Priority: Major > > Currently the script name is hard-coded in the 'createBuilder()' method of > org.apache.spark.launcher.SparkLauncher class: > {code:java} > // code placeholder > private ProcessBuilder createBuilder() { > List cmd = new ArrayList(); > String script = CommandBuilderUtils.isWindows() ? "spark-submit.cmd" : > "spark-submit"; > cmd.add(CommandBuilderUtils.join(File.separator, new > String[]{this.builder.getSparkHome(), "bin", script})); > cmd.addAll(this.builder.buildSparkSubmitArgs()); > .. > .. > }{code} > > > It has following issues, which prevents its usage in certain scenario. > 1) Developer may not use their own custom scripts with different name. They > are forced to use the one shipped with the installation. Overwriting that may > not be the option, when it is not allowed to alter the original installation. > 2) The code expect the script to be present at "SPARK_HOME/bin" folder. > 3) The 'createBuilder()' method is private and hence, extending the > 'org.apache.spark.launcher.SparkLauncher' is not an option. > > Proposed solution: > 1) Developer should be given an optional parameter to set their own custom > script, which may be located at any path. > 2) Only in case the parameter is not set, the default spark-submit script > should be taken from SPARK_HOME/bin folder. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26190) SparkLauncher: Allow users to set their own submitter script instead of hardcoded spark-submit
[ https://issues.apache.org/jira/browse/SPARK-26190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700909#comment-16700909 ] Gyanendra Dwivedi edited comment on SPARK-26190 at 11/27/18 7:44 PM: - [~vanzin] If it was so easy to run or any custom script using {{Runtime.getRuntime().exec(); then why does sparkLauncher exist to execute builtin spark-submit script?}} Creating fake symlinks etc is not a viable solution for production servers where installation location just may change with new version etc. Creating a symlink like adhoc solution should not be a reason for closing this feature request. Don't know why it was never thought to keep things configurable. Hard coding, assuming a specific environment/use case setup and forcing developers to look for work around should not be encouraged. was (Author: gm_dwivedi): [~vanzin] If it was so easy to run spark-submit script using {{Runtime.getRuntime().exec(); then why does sparkLauncher exist?}} Creating fake symlinks etc is not a viable solution for production servers where installation location just may change with new version etc. Creating a symlink like adhoc solution should not be a reason for closing this feature request. Don't know why it was never thought to keep things configurable. Hard coding, assuming a specific environment/use case setup and forcing developers to look for work around should not be encouraged. > SparkLauncher: Allow users to set their own submitter script instead of > hardcoded spark-submit > -- > > Key: SPARK-26190 > URL: https://issues.apache.org/jira/browse/SPARK-26190 > Project: Spark > Issue Type: Improvement > Components: Java API, Spark Core, Spark Submit >Affects Versions: 2.1.0 > Environment: Apache Spark 2.0.1 on yarn cluster (MapR distribution) >Reporter: Gyanendra Dwivedi >Priority: Major > > Currently the script name is hard-coded in the 'createBuilder()' method of > org.apache.spark.launcher.SparkLauncher class: > {code:java} > // code placeholder > private ProcessBuilder createBuilder() { > List cmd = new ArrayList(); > String script = CommandBuilderUtils.isWindows() ? "spark-submit.cmd" : > "spark-submit"; > cmd.add(CommandBuilderUtils.join(File.separator, new > String[]{this.builder.getSparkHome(), "bin", script})); > cmd.addAll(this.builder.buildSparkSubmitArgs()); > .. > .. > }{code} > > > It has following issues, which prevents its usage in certain scenario. > 1) Developer may not use their own custom scripts with different name. They > are forced to use the one shipped with the installation. Overwriting that may > not be the option, when it is not allowed to alter the original installation. > 2) The code expect the script to be present at "SPARK_HOME/bin" folder. > 3) The 'createBuilder()' method is private and hence, extending the > 'org.apache.spark.launcher.SparkLauncher' is not an option. > > Proposed solution: > 1) Developer should be given an optional parameter to set their own custom > script, which may be located at any path. > 2) Only in case the parameter is not set, the default spark-submit script > should be taken from SPARK_HOME/bin folder. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26190) SparkLauncher: Allow users to set their own submitter script instead of hardcoded spark-submit
[ https://issues.apache.org/jira/browse/SPARK-26190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyanendra Dwivedi updated SPARK-26190: -- Description: Currently the script name is hard-coded in the 'createBuilder()' method of org.apache.spark.launcher.SparkLauncher class: {code:java} // code placeholder private ProcessBuilder createBuilder() { List cmd = new ArrayList(); String script = CommandBuilderUtils.isWindows() ? "spark-submit.cmd" : "spark-submit"; cmd.add(CommandBuilderUtils.join(File.separator, new String[]{this.builder.getSparkHome(), "bin", script})); cmd.addAll(this.builder.buildSparkSubmitArgs()); .. .. }{code} It has following issues, which prevents its usage in certain scenario. 1) Developer may not use their own custom scripts with different name. They are forced to use the one shipped with the installation. Overwriting that may not be the option, when it is not allowed to alter the original installation. 2) The code expect the script to be present at "SPARK_HOME/bin" folder. 3) The 'createBuilder()' method is private and hence, extending the 'org.apache.spark.launcher.SparkLauncher' is not an option. Proposed solution: 1) Developer should be given an optional parameter to set their own custom script, which may be located at any path. 2) Only in case the parameter is not set, the default spark-submit script should be taken from SPARK_HOME/bin folder. was: Currently the script name is hard-coded in the 'createBuilder()' method of org.apache.spark.launcher.SparkLauncher class: {code:java} // code placeholder private ProcessBuilder createBuilder() { List cmd = new ArrayList(); String script = CommandBuilderUtils.isWindows() ? "spark-submit.cmd" : "spark-submit"; cmd.add(CommandBuilderUtils.join(File.separator, new String[]{this.builder.getSparkHome(), "bin", script})); cmd.addAll(this.builder.buildSparkSubmitArgs()); .. .. }{code} It has following issues, which prevents its usage in certain scenario. 1) Developer may not use their own custom scripts with different name. They are forced to use the one shipped with the installation. Overwriting that may not be the option, when it is not allowed to alter the original installation. 2) The code expect the script to be present at "SPARK_HOME/bin" folder. 3) The 'createBuilder()' method is private and hence, extending the 'org.apache.spark.launcher.SparkLauncher' is not an option. Proposed solution: 1) Developer should be given option to set their own custom script, which may be located at any path. 2) Only in case the parameter is not set, the default should be taken from SPARK_HOME/bin folder. > SparkLauncher: Allow users to set their own submitter script instead of > hardcoded spark-submit > -- > > Key: SPARK-26190 > URL: https://issues.apache.org/jira/browse/SPARK-26190 > Project: Spark > Issue Type: Improvement > Components: Java API, Spark Core, Spark Submit >Affects Versions: 2.1.0 > Environment: Apache Spark 2.0.1 on yarn cluster (MapR distribution) >Reporter: Gyanendra Dwivedi >Priority: Major > > Currently the script name is hard-coded in the 'createBuilder()' method of > org.apache.spark.launcher.SparkLauncher class: > {code:java} > // code placeholder > private ProcessBuilder createBuilder() { > List cmd = new ArrayList(); > String script = CommandBuilderUtils.isWindows() ? "spark-submit.cmd" : > "spark-submit"; > cmd.add(CommandBuilderUtils.join(File.separator, new > String[]{this.builder.getSparkHome(), "bin", script})); > cmd.addAll(this.builder.buildSparkSubmitArgs()); > .. > .. > }{code} > > > It has following issues, which prevents its usage in certain scenario. > 1) Developer may not use their own custom scripts with different name. They > are forced to use the one shipped with the installation. Overwriting that may > not be the option, when it is not allowed to alter the original installation. > 2) The code expect the script to be present at "SPARK_HOME/bin" folder. > 3) The 'createBuilder()' method is private and hence, extending the > 'org.apache.spark.launcher.SparkLauncher' is not an option. > > Proposed solution: > 1) Developer should be given an optional parameter to set their own custom > script, which may be located at any path. > 2) Only in case the parameter is not set, the default spark-submit script > should be taken from SPARK_HOME/bin folder. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26190) SparkLauncher: Allow users to set their own submitter script instead of hardcoded spark-submit
[ https://issues.apache.org/jira/browse/SPARK-26190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700909#comment-16700909 ] Gyanendra Dwivedi commented on SPARK-26190: --- [~vanzin] If it was so easy to run spark-submit script using {{Runtime.getRuntime().exec(); then why does sparkLauncher exist?}} Creating fake symlinks etc is not a viable solution for production servers where installation location just may change with new version etc. Creating a symlink like adhoc solution should not be a reason for closing this feature request. Don't know why it was never thought to keep things configurable. Hard coding, assuming a specific environment/use case setup and forcing developers to look for work around should not be encouraged. > SparkLauncher: Allow users to set their own submitter script instead of > hardcoded spark-submit > -- > > Key: SPARK-26190 > URL: https://issues.apache.org/jira/browse/SPARK-26190 > Project: Spark > Issue Type: Improvement > Components: Java API, Spark Core, Spark Submit >Affects Versions: 2.1.0 > Environment: Apache Spark 2.0.1 on yarn cluster (MapR distribution) >Reporter: Gyanendra Dwivedi >Priority: Major > > Currently the script name is hard-coded in the 'createBuilder()' method of > org.apache.spark.launcher.SparkLauncher class: > {code:java} > // code placeholder > private ProcessBuilder createBuilder() { > List cmd = new ArrayList(); > String script = CommandBuilderUtils.isWindows() ? "spark-submit.cmd" : > "spark-submit"; > cmd.add(CommandBuilderUtils.join(File.separator, new > String[]{this.builder.getSparkHome(), "bin", script})); > cmd.addAll(this.builder.buildSparkSubmitArgs()); > .. > .. > }{code} > > > It has following issues, which prevents its usage in certain scenario. > 1) Developer may not use their own custom scripts with different name. They > are forced to use the one shipped with the installation. Overwriting that may > not be the option, when it is not allowed to alter the original installation. > 2) The code expect the script to be present at "SPARK_HOME/bin" folder. > 3) The 'createBuilder()' method is private and hence, extending the > 'org.apache.spark.launcher.SparkLauncher' is not an option. > > Proposed solution: > 1) Developer should be given option to set their own custom script, which may > be located at any path. > 2) Only in case the parameter is not set, the default should be taken from > SPARK_HOME/bin folder. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26190) SparkLauncher: Allow users to set their own submitter script instead of hardcoded spark-submit
[ https://issues.apache.org/jira/browse/SPARK-26190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700892#comment-16700892 ] Gyanendra Dwivedi commented on SPARK-26190: --- [~vanzin] I have a wrapper custom script on top of spark-submit script, which has certain security check before it calls the built-in spark-submit script to submit the job. I have to use this script to launch the spark job using a Java program. Do I have any other option, kindly help. > SparkLauncher: Allow users to set their own submitter script instead of > hardcoded spark-submit > -- > > Key: SPARK-26190 > URL: https://issues.apache.org/jira/browse/SPARK-26190 > Project: Spark > Issue Type: Improvement > Components: Java API, Spark Core, Spark Submit >Affects Versions: 2.1.0 > Environment: Apache Spark 2.0.1 on yarn cluster (MapR distribution) >Reporter: Gyanendra Dwivedi >Priority: Major > > Currently the script name is hard-coded in the 'createBuilder()' method of > org.apache.spark.launcher.SparkLauncher class: > {code:java} > // code placeholder > private ProcessBuilder createBuilder() { > List cmd = new ArrayList(); > String script = CommandBuilderUtils.isWindows() ? "spark-submit.cmd" : > "spark-submit"; > cmd.add(CommandBuilderUtils.join(File.separator, new > String[]{this.builder.getSparkHome(), "bin", script})); > cmd.addAll(this.builder.buildSparkSubmitArgs()); > .. > .. > }{code} > > > It has following issues, which prevents its usage in certain scenario. > 1) Developer may not use their own custom scripts with different name. They > are forced to use the one shipped with the installation. Overwriting that may > not be the option, when it is not allowed to alter the original installation. > 2) The code expect the script to be present at "SPARK_HOME/bin" folder. > 3) The 'createBuilder()' method is private and hence, extending the > 'org.apache.spark.launcher.SparkLauncher' is not an option. > > Proposed solution: > 1) Developer should be given option to set their own custom script, which may > be located at any path. > 2) Only in case the parameter is not set, the default should be taken from > SPARK_HOME/bin folder. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26190) SparkLauncher: Allow users to set their own submitter script instead of hardcoded spark-submit
[ https://issues.apache.org/jira/browse/SPARK-26190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700896#comment-16700896 ] Marcelo Vanzin commented on SPARK-26190: Just run your script without SparkLauncher? e.g. using {{Runtime.getRuntime().exec()}}. Or create a fake SPARK_HOME that is mostly symlinks to the original SPARK_HOME, and has your custom spark-submit. I don't think this is a good thing to have in Spark. It's a pretty obscure use case and very easy for people to do the wrong thing and blame Spark. > SparkLauncher: Allow users to set their own submitter script instead of > hardcoded spark-submit > -- > > Key: SPARK-26190 > URL: https://issues.apache.org/jira/browse/SPARK-26190 > Project: Spark > Issue Type: Improvement > Components: Java API, Spark Core, Spark Submit >Affects Versions: 2.1.0 > Environment: Apache Spark 2.0.1 on yarn cluster (MapR distribution) >Reporter: Gyanendra Dwivedi >Priority: Major > > Currently the script name is hard-coded in the 'createBuilder()' method of > org.apache.spark.launcher.SparkLauncher class: > {code:java} > // code placeholder > private ProcessBuilder createBuilder() { > List cmd = new ArrayList(); > String script = CommandBuilderUtils.isWindows() ? "spark-submit.cmd" : > "spark-submit"; > cmd.add(CommandBuilderUtils.join(File.separator, new > String[]{this.builder.getSparkHome(), "bin", script})); > cmd.addAll(this.builder.buildSparkSubmitArgs()); > .. > .. > }{code} > > > It has following issues, which prevents its usage in certain scenario. > 1) Developer may not use their own custom scripts with different name. They > are forced to use the one shipped with the installation. Overwriting that may > not be the option, when it is not allowed to alter the original installation. > 2) The code expect the script to be present at "SPARK_HOME/bin" folder. > 3) The 'createBuilder()' method is private and hence, extending the > 'org.apache.spark.launcher.SparkLauncher' is not an option. > > Proposed solution: > 1) Developer should be given option to set their own custom script, which may > be located at any path. > 2) Only in case the parameter is not set, the default should be taken from > SPARK_HOME/bin folder. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26190) SparkLauncher: Allow users to set their own submitter script instead of hardcoded spark-submit
[ https://issues.apache.org/jira/browse/SPARK-26190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700892#comment-16700892 ] Gyanendra Dwivedi edited comment on SPARK-26190 at 11/27/18 7:23 PM: - [~vanzin] I have a wrapper custom script on top of spark-submit script, which has certain security check before it calls the built-in spark-submit script to submit the job. I have to use this script to launch the spark job using a Java program. Do I have any other option, kindly help. EDIT: The custom script is not located at SPARK_HOME/bin location. was (Author: gm_dwivedi): [~vanzin] I have a wrapper custom script on top of spark-submit script, which has certain security check before it calls the built-in spark-submit script to submit the job. I have to use this script to launch the spark job using a Java program. Do I have any other option, kindly help. > SparkLauncher: Allow users to set their own submitter script instead of > hardcoded spark-submit > -- > > Key: SPARK-26190 > URL: https://issues.apache.org/jira/browse/SPARK-26190 > Project: Spark > Issue Type: Improvement > Components: Java API, Spark Core, Spark Submit >Affects Versions: 2.1.0 > Environment: Apache Spark 2.0.1 on yarn cluster (MapR distribution) >Reporter: Gyanendra Dwivedi >Priority: Major > > Currently the script name is hard-coded in the 'createBuilder()' method of > org.apache.spark.launcher.SparkLauncher class: > {code:java} > // code placeholder > private ProcessBuilder createBuilder() { > List cmd = new ArrayList(); > String script = CommandBuilderUtils.isWindows() ? "spark-submit.cmd" : > "spark-submit"; > cmd.add(CommandBuilderUtils.join(File.separator, new > String[]{this.builder.getSparkHome(), "bin", script})); > cmd.addAll(this.builder.buildSparkSubmitArgs()); > .. > .. > }{code} > > > It has following issues, which prevents its usage in certain scenario. > 1) Developer may not use their own custom scripts with different name. They > are forced to use the one shipped with the installation. Overwriting that may > not be the option, when it is not allowed to alter the original installation. > 2) The code expect the script to be present at "SPARK_HOME/bin" folder. > 3) The 'createBuilder()' method is private and hence, extending the > 'org.apache.spark.launcher.SparkLauncher' is not an option. > > Proposed solution: > 1) Developer should be given option to set their own custom script, which may > be located at any path. > 2) Only in case the parameter is not set, the default should be taken from > SPARK_HOME/bin folder. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26190) SparkLauncher: Allow users to set their own submitter script instead of hardcoded spark-submit
[ https://issues.apache.org/jira/browse/SPARK-26190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyanendra Dwivedi updated SPARK-26190: -- Description: Currently the script name is hard-coded in the 'createBuilder()' method of org.apache.spark.launcher.SparkLauncher class: {code:java} // code placeholder private ProcessBuilder createBuilder() { List cmd = new ArrayList(); String script = CommandBuilderUtils.isWindows() ? "spark-submit.cmd" : "spark-submit"; cmd.add(CommandBuilderUtils.join(File.separator, new String[]{this.builder.getSparkHome(), "bin", script})); cmd.addAll(this.builder.buildSparkSubmitArgs()); .. .. }{code} It has following issues, which prevents its usage in certain scenario. 1) Developer may not use their own custom scripts with different name. They are forced to use the one shipped with the installation. Overwriting that may not be the option, when it is not allowed to alter the original installation. 2) The code expect the script to be present at "SPARK_HOME/bin" folder. 3) The 'createBuilder()' method is private and hence, extending the 'org.apache.spark.launcher.SparkLauncher' is not an option. Proposed solution: 1) Developer should be given option to set their own custom script, which may be located at any path. 2) Only in case the parameter is not set, the default should be taken from SPARK_HOME/bin folder. was: Currently the script name is hard-coded in the 'createBuilder()' method of org.apache.spark.launcher.SparkLauncher class: private ProcessBuilder createBuilder() { List cmd = new ArrayList(); String script = CommandBuilderUtils.isWindows() ? "*spark-submit.cmd*" : "*spark-submit*"; cmd.add(CommandBuilderUtils.join(File.separator, new String[]\{this.builder.getSparkHome(), "bin", script})); cmd.addAll(this.builder.buildSparkSubmitArgs()); . } It has following issues, which prevents its usage in certain scenario. 1) Developer may not use their own custom scripts with different name. They are forced to use the one shipped with the installation. Overwriting that may not be the option, when it is not allowed to alter the original installation. 2) The code expect the script to be present at "SPARK_HOME/bin" folder. 3) The 'createBuilder()' method is private and hence, extending the 'org.apache.spark.launcher.SparkLauncher' is not an option. Proposed solution: 1) Developer should be given option to set their own custom script, which may be located at any path. 2) Only in case the parameter is not set, the default should be taken from SPARK_HOME/bin folder. > SparkLauncher: Allow users to set their own submitter script instead of > hardcoded spark-submit > -- > > Key: SPARK-26190 > URL: https://issues.apache.org/jira/browse/SPARK-26190 > Project: Spark > Issue Type: Improvement > Components: Java API, Spark Core, Spark Submit >Affects Versions: 2.1.0 > Environment: Apache Spark 2.0.1 on yarn cluster (MapR distribution) >Reporter: Gyanendra Dwivedi >Priority: Major > > Currently the script name is hard-coded in the 'createBuilder()' method of > org.apache.spark.launcher.SparkLauncher class: > {code:java} > // code placeholder > private ProcessBuilder createBuilder() { > List cmd = new ArrayList(); > String script = CommandBuilderUtils.isWindows() ? "spark-submit.cmd" : > "spark-submit"; > cmd.add(CommandBuilderUtils.join(File.separator, new > String[]{this.builder.getSparkHome(), "bin", script})); > cmd.addAll(this.builder.buildSparkSubmitArgs()); > .. > .. > }{code} > > > It has following issues, which prevents its usage in certain scenario. > 1) Developer may not use their own custom scripts with different name. They > are forced to use the one shipped with the installation. Overwriting that may > not be the option, when it is not allowed to alter the original installation. > 2) The code expect the script to be present at "SPARK_HOME/bin" folder. > 3) The 'createBuilder()' method is private and hence, extending the > 'org.apache.spark.launcher.SparkLauncher' is not an option. > > Proposed solution: > 1) Developer should be given option to set their own custom script, which may > be located at any path. > 2) Only in case the parameter is not set, the default should be taken from > SPARK_HOME/bin folder. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26190) SparkLauncher: Allow users to set their own submitter script instead of hardcoded spark-submit
[ https://issues.apache.org/jira/browse/SPARK-26190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700890#comment-16700890 ] Marcelo Vanzin commented on SPARK-26190: Based solely on what you wrote here, I'm leaning towards closing this. SparkLauncher is a programatic API around spark-submit, not around your custom script. If you have a custom script, you can call it without using SparkLauncher. > SparkLauncher: Allow users to set their own submitter script instead of > hardcoded spark-submit > -- > > Key: SPARK-26190 > URL: https://issues.apache.org/jira/browse/SPARK-26190 > Project: Spark > Issue Type: Improvement > Components: Java API, Spark Core, Spark Submit >Affects Versions: 2.1.0 > Environment: Apache Spark 2.0.1 on yarn cluster (MapR distribution) >Reporter: Gyanendra Dwivedi >Priority: Major > > Currently the script name is hard-coded in the 'createBuilder()' method of > org.apache.spark.launcher.SparkLauncher class: > > private ProcessBuilder createBuilder() { > List cmd = new ArrayList(); > String script = CommandBuilderUtils.isWindows() ? "*spark-submit.cmd*" : > "*spark-submit*"; > cmd.add(CommandBuilderUtils.join(File.separator, new > String[]\{this.builder.getSparkHome(), "bin", script})); > cmd.addAll(this.builder.buildSparkSubmitArgs()); > > . > } > > It has following issues, which prevents its usage in certain scenario. > 1) Developer may not use their own custom scripts with different name. They > are forced to use the one shipped with the installation. Overwriting that may > not be the option, when it is not allowed to alter the original installation. > 2) The code expect the script to be present at "SPARK_HOME/bin" folder. > 3) The 'createBuilder()' method is private and hence, extending the > 'org.apache.spark.launcher.SparkLauncher' is not an option. > > Proposed solution: > 1) Developer should be given option to set their own custom script, which may > be located at any path. > 2) Only in case the parameter is not set, the default should be taken from > SPARK_HOME/bin folder. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26188) Spark 2.4.0 Partitioning behavior breaks backwards compatibility
[ https://issues.apache.org/jira/browse/SPARK-26188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Doucet-Girard updated SPARK-26188: - Description: My team uses spark to partition and output parquet files to amazon S3. We typically use 256 partitions, from 00 to ff. We've observed that in spark 2.3.2 and prior, it reads the partitions as strings by default. However, in spark 2.4.0 and later, the type of each partition is inferred by default, and partitions such as 00 become 0 and 4d become 4.0. Here is a log sample of this behavior from one of our jobs: 2.4.0: {code:java} 18/11/27 14:02:27 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=00/part-00061-hashredacted.parquet, range: 0-662, partition values: [0] 18/11/27 14:02:28 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=ef/part-00034-hashredacted.parquet, range: 0-662, partition values: [ef] 18/11/27 14:02:29 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=4a/part-00151-hashredacted.parquet, range: 0-662, partition values: [4a] 18/11/27 14:02:30 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=74/part-00180-hashredacted.parquet, range: 0-662, partition values: [74] 18/11/27 14:02:32 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=f5/part-00156-hashredacted.parquet, range: 0-662, partition values: [f5] 18/11/27 14:02:33 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=50/part-00195-hashredacted.parquet, range: 0-662, partition values: [50] 18/11/27 14:02:34 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=70/part-00054-hashredacted.parquet, range: 0-662, partition values: [70] 18/11/27 14:02:35 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=b9/part-00012-hashredacted.parquet, range: 0-662, partition values: [b9] 18/11/27 14:02:37 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=d2/part-00016-hashredacted.parquet, range: 0-662, partition values: [d2] 18/11/27 14:02:38 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=51/part-3-hashredacted.parquet, range: 0-662, partition values: [51] 18/11/27 14:02:39 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=84/part-00135-hashredacted.parquet, range: 0-662, partition values: [84] 18/11/27 14:02:40 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=b5/part-00190-hashredacted.parquet, range: 0-662, partition values: [b5] 18/11/27 14:02:41 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=88/part-00143-hashredacted.parquet, range: 0-662, partition values: [88] 18/11/27 14:02:42 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=4d/part-00120-hashredacted.parquet, range: 0-662, partition values: [4.0] 18/11/27 14:02:43 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=ac/part-00119-hashredacted.parquet, range: 0-662, partition values: [ac] 18/11/27 14:02:44 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=24/part-00139-hashredacted.parquet, range: 0-662, partition values: [24] 18/11/27 14:02:45 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=fd/part-00167-hashredacted.parquet, range: 0-662, partition values: [fd] 18/11/27 14:02:46 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=52/part-00033-hashredacted.parquet, range: 0-662, partition values: [52] 18/11/27 14:02:47 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=ab/part-00083-hashredacted.parquet, range: 0-662, partition values: [ab] 18/11/27 14:02:48 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=f8/part-00018-hashredacted.parquet, range: 0-662, partition values: [f8] 18/11/27 14:02:49 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=7a/part-00093-hashredacted.parquet, range: 0-662, partition values: [7a] 18/11/27 14:02:50 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=ba/part-00020-hashredacted.parquet, range: 0-662, partition values: [ba] 18/11/27 14:02:51 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=2d/part-00085-hashredacted.parquet, range: 0-662, partition values: [2.0] 18/11/27 14:02:52 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=03/part-00099-hashredacted.parquet, range: 0-662, partition values: [3] 18/11/27 14:02:53 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=57/part-00196-hashredacted.parquet, range: 0-662, partition values: [57] 18/11/27 14:02:54 INFO FileSc
[jira] [Created] (SPARK-26190) SparkLauncher: Allow users to set their own submitter script instead of hardcoded spark-submit
Gyanendra Dwivedi created SPARK-26190: - Summary: SparkLauncher: Allow users to set their own submitter script instead of hardcoded spark-submit Key: SPARK-26190 URL: https://issues.apache.org/jira/browse/SPARK-26190 Project: Spark Issue Type: Improvement Components: Java API, Spark Core, Spark Submit Affects Versions: 2.1.0 Environment: Apache Spark 2.0.1 on yarn cluster (MapR distribution) Reporter: Gyanendra Dwivedi Currently the script name is hard-coded in the 'createBuilder()' method of org.apache.spark.launcher.SparkLauncher class: private ProcessBuilder createBuilder() { List cmd = new ArrayList(); String script = CommandBuilderUtils.isWindows() ? "*spark-submit.cmd*" : "*spark-submit*"; cmd.add(CommandBuilderUtils.join(File.separator, new String[]\{this.builder.getSparkHome(), "bin", script})); cmd.addAll(this.builder.buildSparkSubmitArgs()); . } It has following issues, which prevents its usage in certain scenario. 1) Developer may not use their own custom scripts with different name. They are forced to use the one shipped with the installation. Overwriting that may not be the option, when it is not allowed to alter the original installation. 2) The code expect the script to be present at "SPARK_HOME/bin" folder. 3) The 'createBuilder()' method is private and hence, extending the 'org.apache.spark.launcher.SparkLauncher' is not an option. Proposed solution: 1) Developer should be given option to set their own custom script, which may be located at any path. 2) Only in case the parameter is not set, the default should be taken from SPARK_HOME/bin folder. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26188) Spark 2.4.0 Partitioning behavior breaks backwards compatibility
[ https://issues.apache.org/jira/browse/SPARK-26188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Doucet-Girard updated SPARK-26188: - Summary: Spark 2.4.0 Partitioning behavior breaks backwards compatibility (was: Spark 2.4.0 behavior breaks backwards compatibility) > Spark 2.4.0 Partitioning behavior breaks backwards compatibility > > > Key: SPARK-26188 > URL: https://issues.apache.org/jira/browse/SPARK-26188 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Damien Doucet-Girard >Priority: Minor > > My team uses spark to partition and output parquet files to amazon S3. We > typically use 256 partitions, from 00 to ff. > We've observed that in spark 2.3.2 and prior, it reads the partitions as > strings by default. However, in spark 2.4.0 and later, the type of each > partition is inferred by default, and partitions such as 00 become 0 and 4d > become 4.0. > Here is a log sample of this behavior from one of our jobs: > 2.4.0: > {code:java} > 18/11/27 14:02:27 INFO FileScanRDD: Reading File path: > s3a://bucketnamereadacted/ddgirard/suffix=00/part-00061-hashredacted.parquet, > range: 0-662, partition values: [0] > 18/11/27 14:02:28 INFO FileScanRDD: Reading File path: > s3a://bucketnamereadacted/ddgirard/suffix=ef/part-00034-hashredacted.parquet, > range: 0-662, partition values: [ef] > 18/11/27 14:02:29 INFO FileScanRDD: Reading File path: > s3a://bucketnamereadacted/ddgirard/suffix=4a/part-00151-hashredacted.parquet, > range: 0-662, partition values: [4a] > 18/11/27 14:02:30 INFO FileScanRDD: Reading File path: > s3a://bucketnamereadacted/ddgirard/suffix=74/part-00180-hashredacted.parquet, > range: 0-662, partition values: [74] > 18/11/27 14:02:32 INFO FileScanRDD: Reading File path: > s3a://bucketnamereadacted/ddgirard/suffix=f5/part-00156-hashredacted.parquet, > range: 0-662, partition values: [f5] > 18/11/27 14:02:33 INFO FileScanRDD: Reading File path: > s3a://bucketnamereadacted/ddgirard/suffix=50/part-00195-hashredacted.parquet, > range: 0-662, partition values: [50] > 18/11/27 14:02:34 INFO FileScanRDD: Reading File path: > s3a://bucketnamereadacted/ddgirard/suffix=70/part-00054-hashredacted.parquet, > range: 0-662, partition values: [70] > 18/11/27 14:02:35 INFO FileScanRDD: Reading File path: > s3a://bucketnamereadacted/ddgirard/suffix=b9/part-00012-hashredacted.parquet, > range: 0-662, partition values: [b9] > 18/11/27 14:02:37 INFO FileScanRDD: Reading File path: > s3a://bucketnamereadacted/ddgirard/suffix=d2/part-00016-hashredacted.parquet, > range: 0-662, partition values: [d2] > 18/11/27 14:02:38 INFO FileScanRDD: Reading File path: > s3a://bucketnamereadacted/ddgirard/suffix=51/part-3-hashredacted.parquet, > range: 0-662, partition values: [51] > 18/11/27 14:02:39 INFO FileScanRDD: Reading File path: > s3a://bucketnamereadacted/ddgirard/suffix=84/part-00135-hashredacted.parquet, > range: 0-662, partition values: [84] > 18/11/27 14:02:40 INFO FileScanRDD: Reading File path: > s3a://bucketnamereadacted/ddgirard/suffix=b5/part-00190-hashredacted.parquet, > range: 0-662, partition values: [b5] > 18/11/27 14:02:41 INFO FileScanRDD: Reading File path: > s3a://bucketnamereadacted/ddgirard/suffix=88/part-00143-hashredacted.parquet, > range: 0-662, partition values: [88] > 18/11/27 14:02:42 INFO FileScanRDD: Reading File path: > s3a://bucketnamereadacted/ddgirard/suffix=4d/part-00120-hashredacted.parquet, > range: 0-662, partition values: [4.0] > 18/11/27 14:02:43 INFO FileScanRDD: Reading File path: > s3a://bucketnamereadacted/ddgirard/suffix=ac/part-00119-hashredacted.parquet, > range: 0-662, partition values: [ac] > 18/11/27 14:02:44 INFO FileScanRDD: Reading File path: > s3a://bucketnamereadacted/ddgirard/suffix=24/part-00139-hashredacted.parquet, > range: 0-662, partition values: [24] > 18/11/27 14:02:45 INFO FileScanRDD: Reading File path: > s3a://bucketnamereadacted/ddgirard/suffix=fd/part-00167-hashredacted.parquet, > range: 0-662, partition values: [fd] > 18/11/27 14:02:46 INFO FileScanRDD: Reading File path: > s3a://bucketnamereadacted/ddgirard/suffix=52/part-00033-hashredacted.parquet, > range: 0-662, partition values: [52] > 18/11/27 14:02:47 INFO FileScanRDD: Reading File path: > s3a://bucketnamereadacted/ddgirard/suffix=ab/part-00083-hashredacted.parquet, > range: 0-662, partition values: [ab] > 18/11/27 14:02:48 INFO FileScanRDD: Reading File path: > s3a://bucketnamereadacted/ddgirard/suffix=f8/part-00018-hashredacted.parquet, > range: 0-662, partition values: [f8] > 18/11/27 14:02:49 INFO FileScanRDD: Reading File path: > s3a://bucketnamereadacted/ddgirard/suffix=7a/part-00093-hashredacted.parquet, > range: 0-662, partition values: [7a] > 18/11/27 14:02
[jira] [Updated] (SPARK-26188) Spark 2.4.0 behavior breaks backwards compatibility
[ https://issues.apache.org/jira/browse/SPARK-26188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Doucet-Girard updated SPARK-26188: - Description: My team uses spark to partition and output parquet files to amazon S3. We typically use 256 partitions, from 00 to ff. We've observed that in spark 2.3.2 and prior, it reads the partitions as strings by default. However, in spark 2.4.0 and later, the type of each partition is inferred by default, and partitions such as 00 become 0 and 4d become 4.0. Here is a log sample of this behavior from one of our jobs: 2.4.0: {code:java} 18/11/27 14:02:27 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=00/part-00061-hashredacted.parquet, range: 0-662, partition values: [0] 18/11/27 14:02:28 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=ef/part-00034-hashredacted.parquet, range: 0-662, partition values: [ef] 18/11/27 14:02:29 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=4a/part-00151-hashredacted.parquet, range: 0-662, partition values: [4a] 18/11/27 14:02:30 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=74/part-00180-hashredacted.parquet, range: 0-662, partition values: [74] 18/11/27 14:02:32 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=f5/part-00156-hashredacted.parquet, range: 0-662, partition values: [f5] 18/11/27 14:02:33 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=50/part-00195-hashredacted.parquet, range: 0-662, partition values: [50] 18/11/27 14:02:34 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=70/part-00054-hashredacted.parquet, range: 0-662, partition values: [70] 18/11/27 14:02:35 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=b9/part-00012-hashredacted.parquet, range: 0-662, partition values: [b9] 18/11/27 14:02:37 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=d2/part-00016-hashredacted.parquet, range: 0-662, partition values: [d2] 18/11/27 14:02:38 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=51/part-3-hashredacted.parquet, range: 0-662, partition values: [51] 18/11/27 14:02:39 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=84/part-00135-hashredacted.parquet, range: 0-662, partition values: [84] 18/11/27 14:02:40 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=b5/part-00190-hashredacted.parquet, range: 0-662, partition values: [b5] 18/11/27 14:02:41 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=88/part-00143-hashredacted.parquet, range: 0-662, partition values: [88] 18/11/27 14:02:42 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=4d/part-00120-hashredacted.parquet, range: 0-662, partition values: [4.0] 18/11/27 14:02:43 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=ac/part-00119-hashredacted.parquet, range: 0-662, partition values: [ac] 18/11/27 14:02:44 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=24/part-00139-hashredacted.parquet, range: 0-662, partition values: [24] 18/11/27 14:02:45 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=fd/part-00167-hashredacted.parquet, range: 0-662, partition values: [fd] 18/11/27 14:02:46 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=52/part-00033-hashredacted.parquet, range: 0-662, partition values: [52] 18/11/27 14:02:47 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=ab/part-00083-hashredacted.parquet, range: 0-662, partition values: [ab] 18/11/27 14:02:48 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=f8/part-00018-hashredacted.parquet, range: 0-662, partition values: [f8] 18/11/27 14:02:49 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=7a/part-00093-hashredacted.parquet, range: 0-662, partition values: [7a] 18/11/27 14:02:50 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=ba/part-00020-hashredacted.parquet, range: 0-662, partition values: [ba] 18/11/27 14:02:51 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=2d/part-00085-hashredacted.parquet, range: 0-662, partition values: [2.0] 18/11/27 14:02:52 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=03/part-00099-hashredacted.parquet, range: 0-662, partition values: [3] 18/11/27 14:02:53 INFO FileScanRDD: Reading File path: s3a://bucketnamereadacted/ddgirard/suffix=57/part-00196-hashredacted.parquet, range: 0-662, partition values: [57] 18/11/27 14:02:54 INFO FileScan
[jira] [Updated] (SPARK-26188) Spark 2.4.0 behavior breaks backwards compatibility
[ https://issues.apache.org/jira/browse/SPARK-26188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Doucet-Girard updated SPARK-26188: - Description: My team uses spark to partition and output parquet files to amazon S3. We typically use 256 partitions, from 00 to ff. We've observed that in spark 2.3.2 and prior, it reads the partitions as strings by default. However, in spark 2.4.0 and later, the type of each partition is inferred by default, and partitions such as 00 become 0 and 4d become 4.0. After some investigation, we've isolated the issue to [https://github.com/apache/spark/blob/02b510728c31b70e6035ad541bfcdc2b59dcd79a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L132-L136] In the inferPartitioning method, 2.3.2 sets the type inference to false by default: {code:java} val spec = PartitioningUtils.parsePartitions( leafDirs, typeInference = false, basePaths = basePaths, timeZoneId = timeZoneId){code} However, in version 2.4.0, the typeInference flag has been replace with a config flag [https://github.com/apache/spark/blob/075447b3965489ffba4e6afb2b120880bc307505/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L129-L133] {code:java} val inferredPartitionSpec = PartitioningUtils.parsePartitions( leafDirs, typeInference = sparkSession.sessionState.conf.partitionColumnTypeInferenceEnabled, basePaths = basePaths, timeZoneId = timeZoneId){code} And this conf's default value is true {code:java} val PARTITION_COLUMN_TYPE_INFERENCE = buildConf("spark.sql.sources.partitionColumnTypeInference.enabled") .doc("When true, automatically infer the data types for partitioned columns.") .booleanConf .createWithDefault(true){code} [https://github.com/apache/spark/blob/075447b3965489ffba4e6afb2b120880bc307505/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L636-L640] I was wondering if a bug report would be appropriate to preserve backwards compatibility and change the default conf value to false. was: My team uses spark to partition and output parquet files to amazon S3. We typically use 256 partitions, from 00 to ff. We've observed that in spark 2.3.2 and prior, it reads the partitions as strings by default. However, in spark 2.4.0 and later, the type of each partition is inferred by default, and partitions such as 00 become 0 and 4d become 4.0. After some investigation, we've isolated the issue to [https://github.com/apache/spark/blob/02b510728c31b70e6035ad541bfcdc2b59dcd79a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L132-L136] In the inferPartitioning method, 2.3.2 sets the type inference to false by default (lines 132-136): {code:java} val spec = PartitioningUtils.parsePartitions( leafDirs, typeInference = false, basePaths = basePaths, timeZoneId = timeZoneId){code} However, in version 2.4.0, the typeInference flag has been replace with a config flag https://github.com/apache/spark/blob/075447b3965489ffba4e6afb2b120880bc307505/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L129-L133 {code:java} val inferredPartitionSpec = PartitioningUtils.parsePartitions( leafDirs, typeInference = sparkSession.sessionState.conf.partitionColumnTypeInferenceEnabled, basePaths = basePaths, timeZoneId = timeZoneId){code} And this conf's default value is true {code:java} val PARTITION_COLUMN_TYPE_INFERENCE = buildConf("spark.sql.sources.partitionColumnTypeInference.enabled") .doc("When true, automatically infer the data types for partitioned columns.") .booleanConf .createWithDefault(true){code} [https://github.com/apache/spark/blob/075447b3965489ffba4e6afb2b120880bc307505/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L636-L640] I was wondering if a bug report would be appropriate to preserve backwards compatibility and change the default conf value to false. > Spark 2.4.0 behavior breaks backwards compatibility > --- > > Key: SPARK-26188 > URL: https://issues.apache.org/jira/browse/SPARK-26188 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Damien Doucet-Girard >Priority: Minor > > My team uses spark to partition and output parquet files to amazon S3. We > typically use 256 partitions, from 00 to ff. > We've observed that in spark 2.3.2 and prior, it reads the partitions as > strings by default. However, in spark 2.4.0 and later, the type of each > partition is inferred by default, and partitions such as 00 become 0 and 4d > become 4.0. > After some investigation, we've isolated the issue to > > [https://g
[jira] [Commented] (SPARK-26189) Fix the doc of unionAll in SparkR
[ https://issues.apache.org/jira/browse/SPARK-26189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700880#comment-16700880 ] Xiao Li commented on SPARK-26189: - cc [~huaxingao] > Fix the doc of unionAll in SparkR > - > > Key: SPARK-26189 > URL: https://issues.apache.org/jira/browse/SPARK-26189 > Project: Spark > Issue Type: Documentation > Components: R >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > We should fix the doc of unionAll in SparkR. See the discussion: > https://github.com/apache/spark/pull/23131/files#r236760822 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26189) Fix the doc of unionAll in SparkR
[ https://issues.apache.org/jira/browse/SPARK-26189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-26189: Description: We should fix the doc of unionAll in SparkR. See the discussion: https://github.com/apache/spark/pull/23131/files#r236760822 > Fix the doc of unionAll in SparkR > - > > Key: SPARK-26189 > URL: https://issues.apache.org/jira/browse/SPARK-26189 > Project: Spark > Issue Type: Documentation > Components: R >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > We should fix the doc of unionAll in SparkR. See the discussion: > https://github.com/apache/spark/pull/23131/files#r236760822 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26189) Fix the doc of unionAll in SparkR
Xiao Li created SPARK-26189: --- Summary: Fix the doc of unionAll in SparkR Key: SPARK-26189 URL: https://issues.apache.org/jira/browse/SPARK-26189 Project: Spark Issue Type: Documentation Components: R Affects Versions: 3.0.0 Reporter: Xiao Li -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26188) Spark 2.4.0 behavior breaks backwards compatibility
[ https://issues.apache.org/jira/browse/SPARK-26188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Doucet-Girard updated SPARK-26188: - Description: My team uses spark to partition and output parquet files to amazon S3. We typically use 256 partitions, from 00 to ff. We've observed that in spark 2.3.2 and prior, it reads the partitions as strings by default. However, in spark 2.4.0 and later, the type of each partition is inferred by default, and partitions such as 00 become 0 and 4d become 4.0. After some investigation, we've isolated the issue to [https://github.com/apache/spark/blob/02b510728c31b70e6035ad541bfcdc2b59dcd79a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L132-L136] In the inferPartitioning method, 2.3.2 sets the type inference to false by default (lines 132-136): {code:java} val spec = PartitioningUtils.parsePartitions( leafDirs, typeInference = false, basePaths = basePaths, timeZoneId = timeZoneId){code} However, in version 2.4.0, the typeInference flag has been replace with a config flag [https://github.com/apache/spark/blob/075447b3965489ffba4e6afb2b120880bc307505/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L129-L133 {code:java} val inferredPartitionSpec = PartitioningUtils.parsePartitions( leafDirs, typeInference = sparkSession.sessionState.conf.partitionColumnTypeInferenceEnabled, basePaths = basePaths, timeZoneId = timeZoneId){code} And this conf's default value is true {code:java} val PARTITION_COLUMN_TYPE_INFERENCE = buildConf("spark.sql.sources.partitionColumnTypeInference.enabled") .doc("When true, automatically infer the data types for partitioned columns.") .booleanConf .createWithDefault(true){code} [https://github.com/apache/spark/blob/075447b3965489ffba4e6afb2b120880bc307505/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L636-L640] I was wondering if a bug report would be appropriate to preserve backwards compatibility and change the default conf value to false. was: My team uses spark to partition and output parquet files to amazon S3. We typically use 256 partitions, from 00 to ff. We've observed that in spark 2.3.2 and prior, it reads the partitions as strings by default. However, in spark 2.4.0 and later, the type of each partition is inferred by default, and partitions such as 00 become 0 and 4d become 4.0. After some investigation, we've isolated the issue to [https://github.com/apache/spark/blob/02b510728c31b70e6035ad541bfcdc2b59dcd79a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L132-L136] In the inferPartitioning method, 2.3.2 sets the type inference to false by default (lines 132-136): ``` {color:#cc7832}val {color}spec = PartitioningUtils.parsePartitions( leafDirs{color:#cc7832}, {color} typeInference = {color:#cc7832}false{color}{color:#cc7832}, {color} basePaths = basePaths{color:#cc7832}, {color} timeZoneId = timeZoneId) ``` However, in version 2.4.0, the typeInference flag has been replace with a config flag [https://github.com/apache/spark/blob/075447b3965489ffba4e6afb2b120880bc307505/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L129-L133 |https://github.com/apache/spark/blob/075447b3965489ffba4e6afb2b120880bc307505/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L129-L133] ``` {color:#cc7832}val {color}inferredPartitionSpec = PartitioningUtils.parsePartitions(leafDirs{color:#cc7832}, {color} typeInference = sparkSession.sessionState.conf.partitionColumnTypeInferenceEnabled{color:#cc7832}, {color} basePaths = basePaths{color:#cc7832}, {color} timeZoneId = timeZoneId) ``` And this conf's default value is true ``` {color:#cc7832}val {color}PARTITION_COLUMN_TYPE_INFERENCE = buildConf({color:#6a8759}"spark.sql.sources.partitionColumnTypeInference.enabled"{color}) .doc({color:#6a8759}"When true, automatically infer the data types for partitioned columns."{color}) .booleanConf .createWithDefault({color:#cc7832}true{color}) ``` [https://github.com/apache/spark/blob/075447b3965489ffba4e6afb2b120880bc307505/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L636-L640] I was wondering if a bug report would be appropriate to preserve backwards compatibility and change the default conf value to false. > Spark 2.4.0 behavior breaks backwards compatibility > --- > > Key: SPARK-26188 > URL: https://issues.apache.org/jira/browse/SPARK-26188 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Damien Doucet-Girard >
[jira] [Updated] (SPARK-26188) Spark 2.4.0 behavior breaks backwards compatibility
[ https://issues.apache.org/jira/browse/SPARK-26188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Doucet-Girard updated SPARK-26188: - Description: My team uses spark to partition and output parquet files to amazon S3. We typically use 256 partitions, from 00 to ff. We've observed that in spark 2.3.2 and prior, it reads the partitions as strings by default. However, in spark 2.4.0 and later, the type of each partition is inferred by default, and partitions such as 00 become 0 and 4d become 4.0. After some investigation, we've isolated the issue to [https://github.com/apache/spark/blob/02b510728c31b70e6035ad541bfcdc2b59dcd79a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L132-L136] In the inferPartitioning method, 2.3.2 sets the type inference to false by default (lines 132-136): {code:java} val spec = PartitioningUtils.parsePartitions( leafDirs, typeInference = false, basePaths = basePaths, timeZoneId = timeZoneId){code} However, in version 2.4.0, the typeInference flag has been replace with a config flag https://github.com/apache/spark/blob/075447b3965489ffba4e6afb2b120880bc307505/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L129-L133 {code:java} val inferredPartitionSpec = PartitioningUtils.parsePartitions( leafDirs, typeInference = sparkSession.sessionState.conf.partitionColumnTypeInferenceEnabled, basePaths = basePaths, timeZoneId = timeZoneId){code} And this conf's default value is true {code:java} val PARTITION_COLUMN_TYPE_INFERENCE = buildConf("spark.sql.sources.partitionColumnTypeInference.enabled") .doc("When true, automatically infer the data types for partitioned columns.") .booleanConf .createWithDefault(true){code} [https://github.com/apache/spark/blob/075447b3965489ffba4e6afb2b120880bc307505/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L636-L640] I was wondering if a bug report would be appropriate to preserve backwards compatibility and change the default conf value to false. was: My team uses spark to partition and output parquet files to amazon S3. We typically use 256 partitions, from 00 to ff. We've observed that in spark 2.3.2 and prior, it reads the partitions as strings by default. However, in spark 2.4.0 and later, the type of each partition is inferred by default, and partitions such as 00 become 0 and 4d become 4.0. After some investigation, we've isolated the issue to [https://github.com/apache/spark/blob/02b510728c31b70e6035ad541bfcdc2b59dcd79a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L132-L136] In the inferPartitioning method, 2.3.2 sets the type inference to false by default (lines 132-136): {code:java} val spec = PartitioningUtils.parsePartitions( leafDirs, typeInference = false, basePaths = basePaths, timeZoneId = timeZoneId){code} However, in version 2.4.0, the typeInference flag has been replace with a config flag [https://github.com/apache/spark/blob/075447b3965489ffba4e6afb2b120880bc307505/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L129-L133 {code:java} val inferredPartitionSpec = PartitioningUtils.parsePartitions( leafDirs, typeInference = sparkSession.sessionState.conf.partitionColumnTypeInferenceEnabled, basePaths = basePaths, timeZoneId = timeZoneId){code} And this conf's default value is true {code:java} val PARTITION_COLUMN_TYPE_INFERENCE = buildConf("spark.sql.sources.partitionColumnTypeInference.enabled") .doc("When true, automatically infer the data types for partitioned columns.") .booleanConf .createWithDefault(true){code} [https://github.com/apache/spark/blob/075447b3965489ffba4e6afb2b120880bc307505/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L636-L640] I was wondering if a bug report would be appropriate to preserve backwards compatibility and change the default conf value to false. > Spark 2.4.0 behavior breaks backwards compatibility > --- > > Key: SPARK-26188 > URL: https://issues.apache.org/jira/browse/SPARK-26188 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Damien Doucet-Girard >Priority: Minor > > My team uses spark to partition and output parquet files to amazon S3. We > typically use 256 partitions, from 00 to ff. > We've observed that in spark 2.3.2 and prior, it reads the partitions as > strings by default. However, in spark 2.4.0 and later, the type of each > partition is inferred by default, and partitions such as 00 become 0 and 4d > become 4.0. > After some investigation, we've isolated the iss
[jira] [Updated] (SPARK-26188) Spark 2.4.0 behavior breaks backwards compatibility
[ https://issues.apache.org/jira/browse/SPARK-26188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Doucet-Girard updated SPARK-26188: - Description: My team uses spark to partition and output parquet files to amazon S3. We typically use 256 partitions, from 00 to ff. We've observed that in spark 2.3.2 and prior, it reads the partitions as strings by default. However, in spark 2.4.0 and later, the type of each partition is inferred by default, and partitions such as 00 become 0 and 4d become 4.0. After some investigation, we've isolated the issue to [https://github.com/apache/spark/blob/02b510728c31b70e6035ad541bfcdc2b59dcd79a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L132-L136] In the inferPartitioning method, 2.3.2 sets the type inference to false by default (lines 132-136): ``` {color:#cc7832}val {color}spec = PartitioningUtils.parsePartitions( leafDirs{color:#cc7832}, {color} typeInference = {color:#cc7832}false{color}{color:#cc7832}, {color} basePaths = basePaths{color:#cc7832}, {color} timeZoneId = timeZoneId) ``` However, in version 2.4.0, the typeInference flag has been replace with a config flag [https://github.com/apache/spark/blob/075447b3965489ffba4e6afb2b120880bc307505/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L129-L133 |https://github.com/apache/spark/blob/075447b3965489ffba4e6afb2b120880bc307505/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L129-L133] ``` {color:#cc7832}val {color}inferredPartitionSpec = PartitioningUtils.parsePartitions(leafDirs{color:#cc7832}, {color} typeInference = sparkSession.sessionState.conf.partitionColumnTypeInferenceEnabled{color:#cc7832}, {color} basePaths = basePaths{color:#cc7832}, {color} timeZoneId = timeZoneId) ``` And this conf's default value is true ``` {color:#cc7832}val {color}PARTITION_COLUMN_TYPE_INFERENCE = buildConf({color:#6a8759}"spark.sql.sources.partitionColumnTypeInference.enabled"{color}) .doc({color:#6a8759}"When true, automatically infer the data types for partitioned columns."{color}) .booleanConf .createWithDefault({color:#cc7832}true{color}) ``` [https://github.com/apache/spark/blob/075447b3965489ffba4e6afb2b120880bc307505/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L636-L640] I was wondering if a bug report would be appropriate to preserve backwards compatibility and change the default conf value to false. > Spark 2.4.0 behavior breaks backwards compatibility > --- > > Key: SPARK-26188 > URL: https://issues.apache.org/jira/browse/SPARK-26188 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Damien Doucet-Girard >Priority: Minor > > My team uses spark to partition and output parquet files to amazon S3. We > typically use 256 partitions, from 00 to ff. > We've observed that in spark 2.3.2 and prior, it reads the partitions as > strings by default. However, in spark 2.4.0 and later, the type of each > partition is inferred by default, and partitions such as 00 become 0 and 4d > become 4.0. > After some investigation, we've isolated the issue to > [https://github.com/apache/spark/blob/02b510728c31b70e6035ad541bfcdc2b59dcd79a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L132-L136] > > In the inferPartitioning method, 2.3.2 sets the type inference to false by > default (lines 132-136): > ``` > {color:#cc7832}val {color}spec = PartitioningUtils.parsePartitions( > leafDirs{color:#cc7832}, > {color} typeInference = {color:#cc7832}false{color}{color:#cc7832}, > {color} basePaths = basePaths{color:#cc7832}, > {color} timeZoneId = timeZoneId) > ``` > However, in version 2.4.0, the typeInference flag has been replace with a > config flag > > [https://github.com/apache/spark/blob/075447b3965489ffba4e6afb2b120880bc307505/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L129-L133 > > |https://github.com/apache/spark/blob/075447b3965489ffba4e6afb2b120880bc307505/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L129-L133] > > ``` > {color:#cc7832}val {color}inferredPartitionSpec = > PartitioningUtils.parsePartitions(leafDirs{color:#cc7832}, > {color} typeInference = > sparkSession.sessionState.conf.partitionColumnTypeInferenceEnabled{color:#cc7832}, > {color} basePaths = basePaths{color:#cc7832}, > {color} timeZoneId = timeZoneId) > ``` > And this conf's default value is true > ``` > {color:#cc7832}val {color}PARTITION_COLUMN_TYPE_INFERENCE = > > buildConf({color:#6a8759}"spark.sql.so
[jira] [Created] (SPARK-26188) Spark 2.4.0 behavior breaks backwards compatibility
Damien Doucet-Girard created SPARK-26188: Summary: Spark 2.4.0 behavior breaks backwards compatibility Key: SPARK-26188 URL: https://issues.apache.org/jira/browse/SPARK-26188 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.0 Reporter: Damien Doucet-Girard -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26187) Stream-stream left outer join returns outer nulls for already matched rows
[ https://issues.apache.org/jira/browse/SPARK-26187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Chernikov updated SPARK-26187: Description: This is basically the same issue as SPARK-26154, but with slightly easier reproducible and concrete example: {code:java} val rateStream = session.readStream .format("rate") .option("rowsPerSecond", 1) .option("numPartitions", 1) .load() import org.apache.spark.sql.functions._ val fooStream = rateStream .select(col("value").as("fooId"), col("timestamp").as("fooTime")) val barStream = rateStream // Introduce misses for ease of debugging .where(col("value") % 2 === 0) .select(col("value").as("barId"), col("timestamp").as("barTime")){code} If barStream is configured to happen earlier than fooStream, based on time range condition, than everything is all right, no previously matched records are flushed with outer NULLs: {code:java} val query = fooStream .withWatermark("fooTime", "5 seconds") .join( barStream.withWatermark("barTime", "5 seconds"), expr(""" barId = fooId AND fooTime >= barTime AND fooTime <= barTime + interval 5 seconds """), joinType = "leftOuter" ) .writeStream .format("console") .option("truncate", false) .start(){code} It's easy to observe that only odd rows are flushed with NULLs on the right: {code:java} [info] Batch: 1 [info] +-+---+-+---+ [info] |fooId|fooTime |barId|barTime | [info] +-+---+-+---+ [info] |0 |2018-11-27 13:12:34.976|0 |2018-11-27 13:12:34.976| [info] |6 |2018-11-27 13:12:40.976|6 |2018-11-27 13:12:40.976| [info] |10 |2018-11-27 13:12:44.976|10 |2018-11-27 13:12:44.976| [info] |8 |2018-11-27 13:12:42.976|8 |2018-11-27 13:12:42.976| [info] |2 |2018-11-27 13:12:36.976|2 |2018-11-27 13:12:36.976| [info] |4 |2018-11-27 13:12:38.976|4 |2018-11-27 13:12:38.976| [info] +-+---+-+---+ [info] Batch: 2 [info] +-+---+-+---+ [info] |fooId|fooTime |barId|barTime | [info] +-+---+-+---+ [info] |1 |2018-11-27 13:12:35.976|null |null | [info] |3 |2018-11-27 13:12:37.976|null |null | [info] |12 |2018-11-27 13:12:46.976|12 |2018-11-27 13:12:46.976| [info] |18 |2018-11-27 13:12:52.976|18 |2018-11-27 13:12:52.976| [info] |14 |2018-11-27 13:12:48.976|14 |2018-11-27 13:12:48.976| [info] |20 |2018-11-27 13:12:54.976|20 |2018-11-27 13:12:54.976| [info] |16 |2018-11-27 13:12:50.976|16 |2018-11-27 13:12:50.976| [info] +-+---+-+---+ [info] Batch: 3 [info] +-+---+-+---+ [info] |fooId|fooTime |barId|barTime | [info] +-+---+-+---+ [info] |26 |2018-11-27 13:13:00.976|26 |2018-11-27 13:13:00.976| [info] |22 |2018-11-27 13:12:56.976|22 |2018-11-27 13:12:56.976| [info] |7 |2018-11-27 13:12:41.976|null |null | [info] |9 |2018-11-27 13:12:43.976|null |null | [info] |28 |2018-11-27 13:13:02.976|28 |2018-11-27 13:13:02.976| [info] |5 |2018-11-27 13:12:39.976|null |null | [info] |11 |2018-11-27 13:12:45.976|null |null | [info] |13 |2018-11-27 13:12:47.976|null |null | [info] |24 |2018-11-27 13:12:58.976|24 |2018-11-27 13:12:58.976| [info] +-+---+-+---+ {code} On the other hand, if we switch the ordering and now fooStream is happening earlier based on time range condition: {code:java} val query = fooStream .withWatermark("fooTime", "5 seconds") .join( barStream.withWatermark("barTime", "5 seconds"), expr(""" barId = fooId AND barTime >= fooTime AND barTime <= fooTime + interval 5 seconds """), joinType = "leftOuter" ) .writeStream .format("console") .option("truncate", false) .start(){code} Some, not all, previously matched records (with even IDs) are omitted with outer NULLs along with all unmatched records (with odd IDs): {code:java} [info] Batch: 1 [info] +-+---+-+---+ [info] |fooId|fooTime |barId|barTime | [info] +-+---+-+---+ [info] |0 |2018-11-27 13:26:11.463|0 |2018-11-27 13:26:11.463| [info] |6 |2018-11-27 13:26:17.463|6 |2018-11-27 13:26:17.463| [info] |10 |2018-11-27 13:26:21.463|10 |2018-11-27 13:26:21.463| [info] |8 |2018-11-27 13:26:19.463|8 |2018-11-27 13:26:19.463| [info] |2 |2
[jira] [Created] (SPARK-26187) Stream-stream left outer join returns outer nulls for already matched rows
Pavel Chernikov created SPARK-26187: --- Summary: Stream-stream left outer join returns outer nulls for already matched rows Key: SPARK-26187 URL: https://issues.apache.org/jira/browse/SPARK-26187 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.3.2 Reporter: Pavel Chernikov This is basically the same issue as [SPARK-26154|https://issues.apache.org/jira/browse/SPARK-26154], but with slightly easier reproducible and concrete example: {code:java} val rateStream = session.readStream .format("rate") .option("rowsPerSecond", 1) .option("numPartitions", 1) .load() import org.apache.spark.sql.functions._ val fooStream = rateStream .select(col("value").as("fooId"), col("timestamp").as("fooTime")) val barStream = rateStream // Introduce misses for ease of debugging .where(col("value") % 2 === 0) .select(col("value").as("barId"), col("timestamp").as("barTime")){code} If barStream is configured to happen earlier than fooStream, based on time range condition, than everything is all right, no previously matched records are flushed with outer NULLs: {code:java} val query = fooStream .withWatermark("fooTime", "5 seconds") .join( barStream.withWatermark("barTime", "5 seconds"), expr(""" barId = fooId AND fooTime >= barTime AND fooTime <= barTime + interval 5 seconds """), joinType = "leftOuter" ) .writeStream .format("console") .option("truncate", false) .start(){code} It's easy to observe that only odd rows are flushed with NULLs on the right: h6. {code:java} [info] Batch: 1 [info] +-+---+-+---+ [info] |fooId|fooTime |barId|barTime | [info] +-+---+-+---+ [info] |0 |2018-11-27 13:12:34.976|0 |2018-11-27 13:12:34.976| [info] |6 |2018-11-27 13:12:40.976|6 |2018-11-27 13:12:40.976| [info] |10 |2018-11-27 13:12:44.976|10 |2018-11-27 13:12:44.976| [info] |8 |2018-11-27 13:12:42.976|8 |2018-11-27 13:12:42.976| [info] |2 |2018-11-27 13:12:36.976|2 |2018-11-27 13:12:36.976| [info] |4 |2018-11-27 13:12:38.976|4 |2018-11-27 13:12:38.976| [info] +-+---+-+---+ [info] Batch: 2 [info] +-+---+-+---+ [info] |fooId|fooTime |barId|barTime | [info] +-+---+-+---+ [info] |1 |2018-11-27 13:12:35.976|null |null | [info] |3 |2018-11-27 13:12:37.976|null |null | [info] |12 |2018-11-27 13:12:46.976|12 |2018-11-27 13:12:46.976| [info] |18 |2018-11-27 13:12:52.976|18 |2018-11-27 13:12:52.976| [info] |14 |2018-11-27 13:12:48.976|14 |2018-11-27 13:12:48.976| [info] |20 |2018-11-27 13:12:54.976|20 |2018-11-27 13:12:54.976| [info] |16 |2018-11-27 13:12:50.976|16 |2018-11-27 13:12:50.976| [info] +-+---+-+---+ [info] Batch: 3 [info] +-+---+-+---+ [info] |fooId|fooTime |barId|barTime | [info] +-+---+-+---+ [info] |26 |2018-11-27 13:13:00.976|26 |2018-11-27 13:13:00.976| [info] |22 |2018-11-27 13:12:56.976|22 |2018-11-27 13:12:56.976| [info] |7 |2018-11-27 13:12:41.976|null |null | [info] |9 |2018-11-27 13:12:43.976|null |null | [info] |28 |2018-11-27 13:13:02.976|28 |2018-11-27 13:13:02.976| [info] |5 |2018-11-27 13:12:39.976|null |null | [info] |11 |2018-11-27 13:12:45.976|null |null | [info] |13 |2018-11-27 13:12:47.976|null |null | [info] |24 |2018-11-27 13:12:58.976|24 |2018-11-27 13:12:58.976| [info] +-+---+-+---+ {code} On the other hand, if we switch the ordering and now fooStream is happening earlier based on time range condition: {code:java} val query = fooStream .withWatermark("fooTime", "5 seconds") .join( barStream.withWatermark("barTime", "5 seconds"), expr(""" barId = fooId AND barTime >= fooTime AND barTime <= fooTime + interval 5 seconds """), joinType = "leftOuter" ) .writeStream .format("console") .option("truncate", false) .start(){code} Some, not all, previously matched records (with even IDs) are omitted with outer NULLs along with all unmatched records (with odd IDs): h6. {code:java} [info] Batch: 1 [info] +-+---+-+---+ [info] |fooId|fooTime |barId|barTime | [info] +-+---+-+-
[jira] [Commented] (SPARK-26155) Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB scale
[ https://issues.apache.org/jira/browse/SPARK-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700836#comment-16700836 ] Adrian Wang commented on SPARK-26155: - [~Jk_Self] can you also test this on Spark 2.4? > Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS > in 3TB scale > -- > > Key: SPARK-26155 > URL: https://issues.apache.org/jira/browse/SPARK-26155 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Ke Jia >Priority: Major > Attachments: Q19 analysis in Spark2.3 with L486&487.pdf, Q19 analysis > in Spark2.3 without L486 & 487.pdf, q19.sql > > > In our test environment, we found a serious performance degradation issue in > Spark2.3 when running TPC-DS on SKX 8180. Several queries have serious > performance degradation. For example, TPC-DS Q19 needs 126 seconds with Spark > 2.3 while it needs only 29 seconds with Spark2.1 on 3TB data. We investigated > this problem and figured out the root cause is in community patch SPARK-21052 > which add metrics to hash join process. And the impact code is > [L486|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L486] > and > [L487|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L487] > . Q19 costs about 30 seconds without these two lines code and 126 seconds > with these code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21291) R bucketBy partitionBy API
[ https://issues.apache.org/jira/browse/SPARK-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700829#comment-16700829 ] Huaxin Gao commented on SPARK-21291: [~felixcheung] Is it OK with you if I modify the title for this Jira and open a new one for bucketBy? > R bucketBy partitionBy API > -- > > Key: SPARK-21291 > URL: https://issues.apache.org/jira/browse/SPARK-21291 > Project: Spark > Issue Type: Improvement > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Felix Cheung >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.0.0 > > > partitionBy exists but it's for windowspec only -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26186) In progress applications with last updated time is lesser than the cleaning interval are getting removed during cleaning logs
[ https://issues.apache.org/jira/browse/SPARK-26186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700813#comment-16700813 ] Apache Spark commented on SPARK-26186: -- User 'shahidki31' has created a pull request for this issue: https://github.com/apache/spark/pull/23158 > In progress applications with last updated time is lesser than the cleaning > interval are getting removed during cleaning logs > - > > Key: SPARK-26186 > URL: https://issues.apache.org/jira/browse/SPARK-26186 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0, 3.0.0 >Reporter: shahid >Priority: Major > > Inporgress applications with last updated time is withing the cleaning > interval are getting deleted. > > Added a UT to test the scenario. > {code:java} > test("should not clean inprogress application with lastUpdated time less the > maxTime") { > val firstFileModifiedTime = TimeUnit.DAYS.toMillis(1) > val secondFileModifiedTime = TimeUnit.DAYS.toMillis(6) > val maxAge = TimeUnit.DAYS.toMillis(7) > val clock = new ManualClock(0) > val provider = new FsHistoryProvider( > createTestConf().set("spark.history.fs.cleaner.maxAge", > s"${maxAge}ms"), clock) > val log = newLogFile("inProgressApp1", None, inProgress = true) > writeFile(log, true, None, > SparkListenerApplicationStart( > "inProgressApp1", Some("inProgressApp1"), 3L, "test", > Some("attempt1")) > ) > clock.setTime(firstFileModifiedTime) > provider.checkForLogs() > writeFile(log, true, None, > SparkListenerApplicationStart( > "inProgressApp1", Some("inProgressApp1"), 3L, "test", > Some("attempt1")), > SparkListenerJobStart(0, 1L, Nil, null) > ) > clock.setTime(secondFileModifiedTime) > provider.checkForLogs() > clock.setTime(TimeUnit.DAYS.toMillis(10)) > writeFile(log, true, None, > SparkListenerApplicationStart( > "inProgressApp1", Some("inProgressApp1"), 3L, "test", > Some("attempt1")), > SparkListenerJobStart(0, 1L, Nil, null), > SparkListenerJobEnd(0, 1L, JobSucceeded) > ) > provider.checkForLogs() > // This should not trigger any cleanup > updateAndCheck(provider) { list => > list.size should be(1) > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26184) Last updated time is not getting updated in the History Server UI
[ https://issues.apache.org/jira/browse/SPARK-26184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700817#comment-16700817 ] Apache Spark commented on SPARK-26184: -- User 'shahidki31' has created a pull request for this issue: https://github.com/apache/spark/pull/23158 > Last updated time is not getting updated in the History Server UI > - > > Key: SPARK-26184 > URL: https://issues.apache.org/jira/browse/SPARK-26184 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > Attachments: Screenshot from 2018-11-27 23-20-11.png, Screenshot from > 2018-11-27 23-22-38.png > > > For inprogress application, last updated time is not getting updated. > !Screenshot from 2018-11-27 23-20-11.png! > !Screenshot from 2018-11-27 23-22-38.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26184) Last updated time is not getting updated in the History Server UI
[ https://issues.apache.org/jira/browse/SPARK-26184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700815#comment-16700815 ] Apache Spark commented on SPARK-26184: -- User 'shahidki31' has created a pull request for this issue: https://github.com/apache/spark/pull/23158 > Last updated time is not getting updated in the History Server UI > - > > Key: SPARK-26184 > URL: https://issues.apache.org/jira/browse/SPARK-26184 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > Attachments: Screenshot from 2018-11-27 23-20-11.png, Screenshot from > 2018-11-27 23-22-38.png > > > For inprogress application, last updated time is not getting updated. > !Screenshot from 2018-11-27 23-20-11.png! > !Screenshot from 2018-11-27 23-22-38.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26184) Last updated time is not getting updated in the History Server UI
[ https://issues.apache.org/jira/browse/SPARK-26184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700816#comment-16700816 ] Apache Spark commented on SPARK-26184: -- User 'shahidki31' has created a pull request for this issue: https://github.com/apache/spark/pull/23158 > Last updated time is not getting updated in the History Server UI > - > > Key: SPARK-26184 > URL: https://issues.apache.org/jira/browse/SPARK-26184 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > Attachments: Screenshot from 2018-11-27 23-20-11.png, Screenshot from > 2018-11-27 23-22-38.png > > > For inprogress application, last updated time is not getting updated. > !Screenshot from 2018-11-27 23-20-11.png! > !Screenshot from 2018-11-27 23-22-38.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26184) Last updated time is not getting updated in the History Server UI
[ https://issues.apache.org/jira/browse/SPARK-26184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26184: Assignee: Apache Spark > Last updated time is not getting updated in the History Server UI > - > > Key: SPARK-26184 > URL: https://issues.apache.org/jira/browse/SPARK-26184 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Assignee: Apache Spark >Priority: Major > Attachments: Screenshot from 2018-11-27 23-20-11.png, Screenshot from > 2018-11-27 23-22-38.png > > > For inprogress application, last updated time is not getting updated. > !Screenshot from 2018-11-27 23-20-11.png! > !Screenshot from 2018-11-27 23-22-38.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26184) Last updated time is not getting updated in the History Server UI
[ https://issues.apache.org/jira/browse/SPARK-26184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26184: Assignee: (was: Apache Spark) > Last updated time is not getting updated in the History Server UI > - > > Key: SPARK-26184 > URL: https://issues.apache.org/jira/browse/SPARK-26184 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > Attachments: Screenshot from 2018-11-27 23-20-11.png, Screenshot from > 2018-11-27 23-22-38.png > > > For inprogress application, last updated time is not getting updated. > !Screenshot from 2018-11-27 23-20-11.png! > !Screenshot from 2018-11-27 23-22-38.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26186) In progress applications with last updated time is lesser than the cleaning interval are getting removed during cleaning logs
[ https://issues.apache.org/jira/browse/SPARK-26186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26186: Assignee: (was: Apache Spark) > In progress applications with last updated time is lesser than the cleaning > interval are getting removed during cleaning logs > - > > Key: SPARK-26186 > URL: https://issues.apache.org/jira/browse/SPARK-26186 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0, 3.0.0 >Reporter: shahid >Priority: Major > > Inporgress applications with last updated time is withing the cleaning > interval are getting deleted. > > Added a UT to test the scenario. > {code:java} > test("should not clean inprogress application with lastUpdated time less the > maxTime") { > val firstFileModifiedTime = TimeUnit.DAYS.toMillis(1) > val secondFileModifiedTime = TimeUnit.DAYS.toMillis(6) > val maxAge = TimeUnit.DAYS.toMillis(7) > val clock = new ManualClock(0) > val provider = new FsHistoryProvider( > createTestConf().set("spark.history.fs.cleaner.maxAge", > s"${maxAge}ms"), clock) > val log = newLogFile("inProgressApp1", None, inProgress = true) > writeFile(log, true, None, > SparkListenerApplicationStart( > "inProgressApp1", Some("inProgressApp1"), 3L, "test", > Some("attempt1")) > ) > clock.setTime(firstFileModifiedTime) > provider.checkForLogs() > writeFile(log, true, None, > SparkListenerApplicationStart( > "inProgressApp1", Some("inProgressApp1"), 3L, "test", > Some("attempt1")), > SparkListenerJobStart(0, 1L, Nil, null) > ) > clock.setTime(secondFileModifiedTime) > provider.checkForLogs() > clock.setTime(TimeUnit.DAYS.toMillis(10)) > writeFile(log, true, None, > SparkListenerApplicationStart( > "inProgressApp1", Some("inProgressApp1"), 3L, "test", > Some("attempt1")), > SparkListenerJobStart(0, 1L, Nil, null), > SparkListenerJobEnd(0, 1L, JobSucceeded) > ) > provider.checkForLogs() > // This should not trigger any cleanup > updateAndCheck(provider) { list => > list.size should be(1) > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26186) In progress applications with last updated time is lesser than the cleaning interval are getting removed during cleaning logs
[ https://issues.apache.org/jira/browse/SPARK-26186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26186: Assignee: Apache Spark > In progress applications with last updated time is lesser than the cleaning > interval are getting removed during cleaning logs > - > > Key: SPARK-26186 > URL: https://issues.apache.org/jira/browse/SPARK-26186 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0, 3.0.0 >Reporter: shahid >Assignee: Apache Spark >Priority: Major > > Inporgress applications with last updated time is withing the cleaning > interval are getting deleted. > > Added a UT to test the scenario. > {code:java} > test("should not clean inprogress application with lastUpdated time less the > maxTime") { > val firstFileModifiedTime = TimeUnit.DAYS.toMillis(1) > val secondFileModifiedTime = TimeUnit.DAYS.toMillis(6) > val maxAge = TimeUnit.DAYS.toMillis(7) > val clock = new ManualClock(0) > val provider = new FsHistoryProvider( > createTestConf().set("spark.history.fs.cleaner.maxAge", > s"${maxAge}ms"), clock) > val log = newLogFile("inProgressApp1", None, inProgress = true) > writeFile(log, true, None, > SparkListenerApplicationStart( > "inProgressApp1", Some("inProgressApp1"), 3L, "test", > Some("attempt1")) > ) > clock.setTime(firstFileModifiedTime) > provider.checkForLogs() > writeFile(log, true, None, > SparkListenerApplicationStart( > "inProgressApp1", Some("inProgressApp1"), 3L, "test", > Some("attempt1")), > SparkListenerJobStart(0, 1L, Nil, null) > ) > clock.setTime(secondFileModifiedTime) > provider.checkForLogs() > clock.setTime(TimeUnit.DAYS.toMillis(10)) > writeFile(log, true, None, > SparkListenerApplicationStart( > "inProgressApp1", Some("inProgressApp1"), 3L, "test", > Some("attempt1")), > SparkListenerJobStart(0, 1L, Nil, null), > SparkListenerJobEnd(0, 1L, JobSucceeded) > ) > provider.checkForLogs() > // This should not trigger any cleanup > updateAndCheck(provider) { list => > list.size should be(1) > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26186) In progress applications with last updated time is lesser than the cleaning interval are getting removed during cleaning logs
[ https://issues.apache.org/jira/browse/SPARK-26186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700814#comment-16700814 ] Apache Spark commented on SPARK-26186: -- User 'shahidki31' has created a pull request for this issue: https://github.com/apache/spark/pull/23158 > In progress applications with last updated time is lesser than the cleaning > interval are getting removed during cleaning logs > - > > Key: SPARK-26186 > URL: https://issues.apache.org/jira/browse/SPARK-26186 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0, 3.0.0 >Reporter: shahid >Priority: Major > > Inporgress applications with last updated time is withing the cleaning > interval are getting deleted. > > Added a UT to test the scenario. > {code:java} > test("should not clean inprogress application with lastUpdated time less the > maxTime") { > val firstFileModifiedTime = TimeUnit.DAYS.toMillis(1) > val secondFileModifiedTime = TimeUnit.DAYS.toMillis(6) > val maxAge = TimeUnit.DAYS.toMillis(7) > val clock = new ManualClock(0) > val provider = new FsHistoryProvider( > createTestConf().set("spark.history.fs.cleaner.maxAge", > s"${maxAge}ms"), clock) > val log = newLogFile("inProgressApp1", None, inProgress = true) > writeFile(log, true, None, > SparkListenerApplicationStart( > "inProgressApp1", Some("inProgressApp1"), 3L, "test", > Some("attempt1")) > ) > clock.setTime(firstFileModifiedTime) > provider.checkForLogs() > writeFile(log, true, None, > SparkListenerApplicationStart( > "inProgressApp1", Some("inProgressApp1"), 3L, "test", > Some("attempt1")), > SparkListenerJobStart(0, 1L, Nil, null) > ) > clock.setTime(secondFileModifiedTime) > provider.checkForLogs() > clock.setTime(TimeUnit.DAYS.toMillis(10)) > writeFile(log, true, None, > SparkListenerApplicationStart( > "inProgressApp1", Some("inProgressApp1"), 3L, "test", > Some("attempt1")), > SparkListenerJobStart(0, 1L, Nil, null), > SparkListenerJobEnd(0, 1L, JobSucceeded) > ) > provider.checkForLogs() > // This should not trigger any cleanup > updateAndCheck(provider) { list => > list.size should be(1) > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26185) add weightCol in python MulticlassClassificationEvaluator
[ https://issues.apache.org/jira/browse/SPARK-26185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-26185: --- Description: https://issues.apache.org/jira/browse/SPARK-24101 added weightCol in MulticlassClassificationEvaluator.scala. This Jira will add weightCol in python version of MulticlassClassificationEvaluator. (was: --https://issues.apache.org/jira/browse/SPARK-24101-- added weightCol in MulticlassClassificationEvaluator.scala. This Jira will add weightCol in python version of MulticlassClassificationEvaluator.) > add weightCol in python MulticlassClassificationEvaluator > - > > Key: SPARK-26185 > URL: https://issues.apache.org/jira/browse/SPARK-26185 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib, PySpark >Affects Versions: 3.0.0 >Reporter: Huaxin Gao >Priority: Minor > > https://issues.apache.org/jira/browse/SPARK-24101 added weightCol in > MulticlassClassificationEvaluator.scala. This Jira will add weightCol in > python version of MulticlassClassificationEvaluator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26186) In progress applications with last updated time is lesser than the cleaning interval are getting removed during cleaning logs
shahid created SPARK-26186: -- Summary: In progress applications with last updated time is lesser than the cleaning interval are getting removed during cleaning logs Key: SPARK-26186 URL: https://issues.apache.org/jira/browse/SPARK-26186 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.0, 3.0.0 Reporter: shahid Inporgress applications with last updated time is withing the cleaning interval are getting deleted. Added a UT to test the scenario. {code:java} test("should not clean inprogress application with lastUpdated time less the maxTime") { val firstFileModifiedTime = TimeUnit.DAYS.toMillis(1) val secondFileModifiedTime = TimeUnit.DAYS.toMillis(6) val maxAge = TimeUnit.DAYS.toMillis(7) val clock = new ManualClock(0) val provider = new FsHistoryProvider( createTestConf().set("spark.history.fs.cleaner.maxAge", s"${maxAge}ms"), clock) val log = newLogFile("inProgressApp1", None, inProgress = true) writeFile(log, true, None, SparkListenerApplicationStart( "inProgressApp1", Some("inProgressApp1"), 3L, "test", Some("attempt1")) ) clock.setTime(firstFileModifiedTime) provider.checkForLogs() writeFile(log, true, None, SparkListenerApplicationStart( "inProgressApp1", Some("inProgressApp1"), 3L, "test", Some("attempt1")), SparkListenerJobStart(0, 1L, Nil, null) ) clock.setTime(secondFileModifiedTime) provider.checkForLogs() clock.setTime(TimeUnit.DAYS.toMillis(10)) writeFile(log, true, None, SparkListenerApplicationStart( "inProgressApp1", Some("inProgressApp1"), 3L, "test", Some("attempt1")), SparkListenerJobStart(0, 1L, Nil, null), SparkListenerJobEnd(0, 1L, JobSucceeded) ) provider.checkForLogs() // This should not trigger any cleanup updateAndCheck(provider) { list => list.size should be(1) } } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26184) Last updated time is not getting updated in the History Server UI
[ https://issues.apache.org/jira/browse/SPARK-26184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shahid updated SPARK-26184: --- Attachment: (was: Screenshot from 2018-11-27 13-21-34.png) > Last updated time is not getting updated in the History Server UI > - > > Key: SPARK-26184 > URL: https://issues.apache.org/jira/browse/SPARK-26184 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > Attachments: Screenshot from 2018-11-27 23-20-11.png, screenshot-1.png > > > For inprogress application, last updated time is not getting updated. > !Screenshot from 2018-11-27 23-20-11.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26184) Last updated time is not getting updated in the History Server UI
[ https://issues.apache.org/jira/browse/SPARK-26184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shahid updated SPARK-26184: --- Description: For inprogress application, last updated time is not getting updated. !Screenshot from 2018-11-27 13-21-34.png! !screenshot-1.png! was: For inprogress application, last updated time is not getting updated. > Last updated time is not getting updated in the History Server UI > - > > Key: SPARK-26184 > URL: https://issues.apache.org/jira/browse/SPARK-26184 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > Attachments: Screenshot from 2018-11-27 13-21-34.png, screenshot-1.png > > > For inprogress application, last updated time is not getting updated. > !Screenshot from 2018-11-27 13-21-34.png! > !screenshot-1.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26185) add weightCol in python MulticlassClassificationEvaluator
[ https://issues.apache.org/jira/browse/SPARK-26185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26185: Assignee: Apache Spark > add weightCol in python MulticlassClassificationEvaluator > - > > Key: SPARK-26185 > URL: https://issues.apache.org/jira/browse/SPARK-26185 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib, PySpark >Affects Versions: 3.0.0 >Reporter: Huaxin Gao >Assignee: Apache Spark >Priority: Minor > > --https://issues.apache.org/jira/browse/SPARK-24101-- added weightCol in > MulticlassClassificationEvaluator.scala. This Jira will add weightCol in > python version of MulticlassClassificationEvaluator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26185) add weightCol in python MulticlassClassificationEvaluator
[ https://issues.apache.org/jira/browse/SPARK-26185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-26185: --- Description: --https://issues.apache.org/jira/browse/SPARK-24101-- added weightCol in MulticlassClassificationEvaluator.scala. This Jira will add weightCol in python version of MulticlassClassificationEvaluator. (was: -https://issues.apache.org/jira/browse/SPARK-24101- added weightCol in MulticlassClassificationEvaluator.scala. This Jira will add weightCol in python version of MulticlassClassificationEvaluator.) > add weightCol in python MulticlassClassificationEvaluator > - > > Key: SPARK-26185 > URL: https://issues.apache.org/jira/browse/SPARK-26185 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib, PySpark >Affects Versions: 3.0.0 >Reporter: Huaxin Gao >Priority: Minor > > --https://issues.apache.org/jira/browse/SPARK-24101-- added weightCol in > MulticlassClassificationEvaluator.scala. This Jira will add weightCol in > python version of MulticlassClassificationEvaluator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26185) add weightCol in python MulticlassClassificationEvaluator
[ https://issues.apache.org/jira/browse/SPARK-26185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700806#comment-16700806 ] Apache Spark commented on SPARK-26185: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/23157 > add weightCol in python MulticlassClassificationEvaluator > - > > Key: SPARK-26185 > URL: https://issues.apache.org/jira/browse/SPARK-26185 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib, PySpark >Affects Versions: 3.0.0 >Reporter: Huaxin Gao >Priority: Minor > > --https://issues.apache.org/jira/browse/SPARK-24101-- added weightCol in > MulticlassClassificationEvaluator.scala. This Jira will add weightCol in > python version of MulticlassClassificationEvaluator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26185) add weightCol in python MulticlassClassificationEvaluator
[ https://issues.apache.org/jira/browse/SPARK-26185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-26185: --- Description: -https://issues.apache.org/jira/browse/SPARK-24101- added weightCol in MulticlassClassificationEvaluator.scala. This Jira will add weightCol in python version of MulticlassClassificationEvaluator. (was: https://issues.apache.org/jira/browse/SPARK-24101 added weightCol in MulticlassClassificationEvaluator.scala. This Jira will add weightCol in python version of MulticlassClassificationEvaluator.) > add weightCol in python MulticlassClassificationEvaluator > - > > Key: SPARK-26185 > URL: https://issues.apache.org/jira/browse/SPARK-26185 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib, PySpark >Affects Versions: 3.0.0 >Reporter: Huaxin Gao >Priority: Minor > > -https://issues.apache.org/jira/browse/SPARK-24101- added weightCol in > MulticlassClassificationEvaluator.scala. This Jira will add weightCol in > python version of MulticlassClassificationEvaluator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26185) add weightCol in python MulticlassClassificationEvaluator
[ https://issues.apache.org/jira/browse/SPARK-26185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26185: Assignee: (was: Apache Spark) > add weightCol in python MulticlassClassificationEvaluator > - > > Key: SPARK-26185 > URL: https://issues.apache.org/jira/browse/SPARK-26185 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib, PySpark >Affects Versions: 3.0.0 >Reporter: Huaxin Gao >Priority: Minor > > --https://issues.apache.org/jira/browse/SPARK-24101-- added weightCol in > MulticlassClassificationEvaluator.scala. This Jira will add weightCol in > python version of MulticlassClassificationEvaluator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26186) In progress applications with last updated time is lesser than the cleaning interval are getting removed during cleaning logs
[ https://issues.apache.org/jira/browse/SPARK-26186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700804#comment-16700804 ] shahid commented on SPARK-26186: I will raise a PR. > In progress applications with last updated time is lesser than the cleaning > interval are getting removed during cleaning logs > - > > Key: SPARK-26186 > URL: https://issues.apache.org/jira/browse/SPARK-26186 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0, 3.0.0 >Reporter: shahid >Priority: Major > > Inporgress applications with last updated time is withing the cleaning > interval are getting deleted. > > Added a UT to test the scenario. > {code:java} > test("should not clean inprogress application with lastUpdated time less the > maxTime") { > val firstFileModifiedTime = TimeUnit.DAYS.toMillis(1) > val secondFileModifiedTime = TimeUnit.DAYS.toMillis(6) > val maxAge = TimeUnit.DAYS.toMillis(7) > val clock = new ManualClock(0) > val provider = new FsHistoryProvider( > createTestConf().set("spark.history.fs.cleaner.maxAge", > s"${maxAge}ms"), clock) > val log = newLogFile("inProgressApp1", None, inProgress = true) > writeFile(log, true, None, > SparkListenerApplicationStart( > "inProgressApp1", Some("inProgressApp1"), 3L, "test", > Some("attempt1")) > ) > clock.setTime(firstFileModifiedTime) > provider.checkForLogs() > writeFile(log, true, None, > SparkListenerApplicationStart( > "inProgressApp1", Some("inProgressApp1"), 3L, "test", > Some("attempt1")), > SparkListenerJobStart(0, 1L, Nil, null) > ) > clock.setTime(secondFileModifiedTime) > provider.checkForLogs() > clock.setTime(TimeUnit.DAYS.toMillis(10)) > writeFile(log, true, None, > SparkListenerApplicationStart( > "inProgressApp1", Some("inProgressApp1"), 3L, "test", > Some("attempt1")), > SparkListenerJobStart(0, 1L, Nil, null), > SparkListenerJobEnd(0, 1L, JobSucceeded) > ) > provider.checkForLogs() > // This should not trigger any cleanup > updateAndCheck(provider) { list => > list.size should be(1) > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23410) Unable to read jsons in charset different from UTF-8
[ https://issues.apache.org/jira/browse/SPARK-23410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700799#comment-16700799 ] Maxim Gekk commented on SPARK-23410: > Even if lineSeps is set, it is still necessary to identify the file bom > charset. For sure encoding should be inferred from BOM in any case. I just want to take you attention on the case when when lineSep is not set. We infer lineSep in UTF-8, and most likely should do the same for other encodings. > In my opinion, we can try to read the first four bytes of the file on the > executor side to identify the encoding of the file.Because once the charset > of the file is determined, the charset of lineSeps is also determined. For JSON, we create JacksonParser on the driver side before file reading. For example: https://github.com/apache/spark/blob/e9af9460bc008106b670abac44a869721bfde42a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala#L105-L107 . Need to take this into account. [~x1q1j1] Please, open a PR, I am ready to review it. > I know BOM is only the beginning of the file .. [~hyukjin.kwon] I just double checked ;-) > Unable to read jsons in charset different from UTF-8 > > > Key: SPARK-23410 > URL: https://issues.apache.org/jira/browse/SPARK-23410 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Maxim Gekk >Priority: Major > Attachments: utf16WithBOM.json > > > Currently the Json Parser is forced to read json files in UTF-8. Such > behavior breaks backward compatibility with Spark 2.2.1 and previous versions > that can read json files in UTF-16, UTF-32 and other encodings due to using > of the auto detection mechanism of the jackson library. Need to give back to > users possibility to read json files in specified charset and/or detect > charset automatically as it was before. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26184) Last updated time is not getting updated in the History Server UI
[ https://issues.apache.org/jira/browse/SPARK-26184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shahid updated SPARK-26184: --- Attachment: Screenshot from 2018-11-27 23-22-38.png > Last updated time is not getting updated in the History Server UI > - > > Key: SPARK-26184 > URL: https://issues.apache.org/jira/browse/SPARK-26184 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > Attachments: Screenshot from 2018-11-27 23-20-11.png, Screenshot from > 2018-11-27 23-22-38.png > > > For inprogress application, last updated time is not getting updated. > !Screenshot from 2018-11-27 23-20-11.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26184) Last updated time is not getting updated in the History Server UI
[ https://issues.apache.org/jira/browse/SPARK-26184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shahid updated SPARK-26184: --- Description: For inprogress application, last updated time is not getting updated. !Screenshot from 2018-11-27 23-20-11.png! !Screenshot from 2018-11-27 23-22-38.png! was: For inprogress application, last updated time is not getting updated. !Screenshot from 2018-11-27 23-20-11.png! > Last updated time is not getting updated in the History Server UI > - > > Key: SPARK-26184 > URL: https://issues.apache.org/jira/browse/SPARK-26184 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > Attachments: Screenshot from 2018-11-27 23-20-11.png, Screenshot from > 2018-11-27 23-22-38.png > > > For inprogress application, last updated time is not getting updated. > !Screenshot from 2018-11-27 23-20-11.png! > !Screenshot from 2018-11-27 23-22-38.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26184) Last updated time is not getting updated in the History Server UI
[ https://issues.apache.org/jira/browse/SPARK-26184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shahid updated SPARK-26184: --- Attachment: (was: screenshot-1.png) > Last updated time is not getting updated in the History Server UI > - > > Key: SPARK-26184 > URL: https://issues.apache.org/jira/browse/SPARK-26184 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > Attachments: Screenshot from 2018-11-27 23-20-11.png > > > For inprogress application, last updated time is not getting updated. > !Screenshot from 2018-11-27 23-20-11.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26184) Last updated time is not getting updated in the History Server UI
[ https://issues.apache.org/jira/browse/SPARK-26184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shahid updated SPARK-26184: --- Attachment: Screenshot from 2018-11-27 23-20-11.png > Last updated time is not getting updated in the History Server UI > - > > Key: SPARK-26184 > URL: https://issues.apache.org/jira/browse/SPARK-26184 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > Attachments: Screenshot from 2018-11-27 23-20-11.png, screenshot-1.png > > > For inprogress application, last updated time is not getting updated. > !Screenshot from 2018-11-27 13-21-34.png! > !screenshot-1.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26184) Last updated time is not getting updated in the History Server UI
[ https://issues.apache.org/jira/browse/SPARK-26184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shahid updated SPARK-26184: --- Description: For inprogress application, last updated time is not getting updated. !Screenshot from 2018-11-27 23-20-11.png! was: For inprogress application, last updated time is not getting updated. !Screenshot from 2018-11-27 13-21-34.png! !screenshot-1.png! > Last updated time is not getting updated in the History Server UI > - > > Key: SPARK-26184 > URL: https://issues.apache.org/jira/browse/SPARK-26184 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > Attachments: Screenshot from 2018-11-27 23-20-11.png, screenshot-1.png > > > For inprogress application, last updated time is not getting updated. > !Screenshot from 2018-11-27 23-20-11.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org