[jira] [Created] (SPARK-48856) Use isolated JobArtifactSet for each spark session
lifulong created SPARK-48856: Summary: Use isolated JobArtifactSet for each spark session Key: SPARK-48856 URL: https://issues.apache.org/jira/browse/SPARK-48856 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: lifulong Consider multi user use same spark cluster with independent spark session, Kyuubi group session share level for example: # user may use different udf jar to register function, but may has same name udf in different udf jar # the valid udf class is always the class from the first load udf jar, this not meet users expections -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45570) Spark job hangs due to task launch thread failed to create
[ https://issues.apache.org/jira/browse/SPARK-45570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776595#comment-17776595 ] lifulong commented on SPARK-45570: -- !image-2023-10-18-18-18-36-132.png! catch thread create exception from line "threadPool.execute(tr)", and do execBackend.statusUpdate(taskDescription.taskId, TaskState.FAILED, EMPTY_BYTE_BUFFER) after get exception can fix this problem in theory is this solution ok? > Spark job hangs due to task launch thread failed to create > -- > > Key: SPARK-45570 > URL: https://issues.apache.org/jira/browse/SPARK-45570 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2, 3.5.0 > Environment: spark.speculation is use default value false > spark version 3.1.2 > >Reporter: lifulong >Priority: Major > Attachments: image-2023-10-18-18-18-36-132.png > > > spark job hangs while web ui show there is one task in running stage keep > running for multi hours, while other tasks finished in a few minutes > executor will never report task launch failed info to driver > > Below is spark task execute thread launch log: > 23/10/17 04:45:42 ERROR Inbox: An error happened while processing message in > the inbox for Executor > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at org.apache.spark.executor.Executor.launchTask(Executor.scala:270) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:173) > at > org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at > org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) > at > org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45570) Spark job hangs due to task launch thread failed to create
[ https://issues.apache.org/jira/browse/SPARK-45570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lifulong updated SPARK-45570: - Attachment: image-2023-10-18-18-18-36-132.png > Spark job hangs due to task launch thread failed to create > -- > > Key: SPARK-45570 > URL: https://issues.apache.org/jira/browse/SPARK-45570 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2, 3.5.0 > Environment: spark.speculation is use default value false > spark version 3.1.2 > >Reporter: lifulong >Priority: Major > Attachments: image-2023-10-18-18-18-36-132.png > > > spark job hangs while web ui show there is one task in running stage keep > running for multi hours, while other tasks finished in a few minutes > executor will never report task launch failed info to driver > > Below is spark task execute thread launch log: > 23/10/17 04:45:42 ERROR Inbox: An error happened while processing message in > the inbox for Executor > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at org.apache.spark.executor.Executor.launchTask(Executor.scala:270) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:173) > at > org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at > org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) > at > org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45570) Spark job hangs due to task launch thread failed to create
[ https://issues.apache.org/jira/browse/SPARK-45570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lifulong updated SPARK-45570: - Affects Version/s: 3.5.0 > Spark job hangs due to task launch thread failed to create > -- > > Key: SPARK-45570 > URL: https://issues.apache.org/jira/browse/SPARK-45570 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2, 3.5.0 > Environment: spark.speculation is use default value false > >Reporter: lifulong >Priority: Major > > spark job hangs while web ui show there is one task in running stage keep > running for multi hours, while other tasks finished in a few minutes > executor will never report task launch failed info to driver > > Below is spark task execute thread launch log: > 23/10/17 04:45:42 ERROR Inbox: An error happened while processing message in > the inbox for Executor > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at org.apache.spark.executor.Executor.launchTask(Executor.scala:270) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:173) > at > org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at > org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) > at > org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45570) Spark job hangs due to task launch thread failed to create
[ https://issues.apache.org/jira/browse/SPARK-45570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lifulong updated SPARK-45570: - Environment: spark.speculation is use default value false spark version 3.1.2 was: spark.speculation is use default value false > Spark job hangs due to task launch thread failed to create > -- > > Key: SPARK-45570 > URL: https://issues.apache.org/jira/browse/SPARK-45570 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2, 3.5.0 > Environment: spark.speculation is use default value false > spark version 3.1.2 > >Reporter: lifulong >Priority: Major > > spark job hangs while web ui show there is one task in running stage keep > running for multi hours, while other tasks finished in a few minutes > executor will never report task launch failed info to driver > > Below is spark task execute thread launch log: > 23/10/17 04:45:42 ERROR Inbox: An error happened while processing message in > the inbox for Executor > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at org.apache.spark.executor.Executor.launchTask(Executor.scala:270) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:173) > at > org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at > org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) > at > org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45570) Spark job hangs due to task launch thread failed to create
[ https://issues.apache.org/jira/browse/SPARK-45570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lifulong updated SPARK-45570: - Description: spark job hangs while web ui show there is one task in running stage keep running for multi hours, while other tasks finished in a few minutes executor will never report task launch failed info to driver Below is spark task execute thread launch log: 23/10/17 04:45:42 ERROR Inbox: An error happened while processing message in the inbox for Executor java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:717) at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) at org.apache.spark.executor.Executor.launchTask(Executor.scala:270) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:173) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) was: spark job hangs while web ui show there is one task in running stage keep running for multi hours, while other tasks finished in a few minutes Below is spark task execute thread launch log: 23/10/17 04:45:42 ERROR Inbox: An error happened while processing message in the inbox for Executor java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:717) at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) at org.apache.spark.executor.Executor.launchTask(Executor.scala:270) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:173) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) > Spark job hangs due to task launch thread failed to create > -- > > Key: SPARK-45570 > URL: https://issues.apache.org/jira/browse/SPARK-45570 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2 > Environment: spark.speculation is use default value false > >Reporter: lifulong >Priority: Major > > spark job hangs while web ui show there is one task in running stage keep > running for multi hours, while other tasks finished in a few minutes > executor will never report task launch failed info to driver > > Below is spark task execute thread launch log: > 23/10/17 04:45:42 ERROR Inbox: An error happened while processing message in > the inbox for Executor > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at org.apache.spark.executor.Executor.launchTask(Executor.scala:270) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonf
[jira] [Created] (SPARK-45570) Spark job hangs due to task launch thread failed to create
lifulong created SPARK-45570: Summary: Spark job hangs due to task launch thread failed to create Key: SPARK-45570 URL: https://issues.apache.org/jira/browse/SPARK-45570 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.1.2 Environment: spark.speculation is use default value false Reporter: lifulong spark job hangs while web ui show there is one task in running stage keep running for multi hours, while other tasks finished in a few minutes Below is spark task execute thread launch log: 23/10/17 04:45:42 ERROR Inbox: An error happened while processing message in the inbox for Executor java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:717) at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) at org.apache.spark.executor.Executor.launchTask(Executor.scala:270) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:173) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24928) spark sql cross join running time too long
[ https://issues.apache.org/jira/browse/SPARK-24928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567986#comment-16567986 ] LIFULONG commented on SPARK-24928: -- for (x <- rdd1.iterator(currSplit.s1, context); y <- rdd2.iterator(currSplit.s2, context)) yield (x, y) code from CartesianRDD.compute() method,looks like it will load right rdd from text for each record in left rdd. > spark sql cross join running time too long > -- > > Key: SPARK-24928 > URL: https://issues.apache.org/jira/browse/SPARK-24928 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 1.6.2 >Reporter: LIFULONG >Priority: Minor > > spark sql running time is too long while input left table and right table is > small hdfs text format data, > the sql is: select * from t1 cross join t2 > the line of t1 is 49, three column > the line of t2 is 1, one column only > running more than 30mins and then failed > > > spark CartesianRDD also has the same problem, example test code is: > val ones = sc.textFile("hdfs://host:port/data/cartesian_data/t1b") //1 line > 1 column > val twos = sc.textFile("hdfs://host:port/data/cartesian_data/t2b") //49 > line 3 column > val cartesian = new CartesianRDD(sc, twos, ones) > cartesian.count() > running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use > less than 10 seconds -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24928) spark sql cross join running time too long
[ https://issues.apache.org/jira/browse/SPARK-24928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LIFULONG updated SPARK-24928: - Description: spark sql running time is too long while input left table and right table is small hdfs text format data, the sql is: select * from t1 cross join t2 the line of t1 is 49, three column the line of t2 is 1, one column only running more than 30mins and then failed spark CartesianRDD also has the same problem, example test code is: val ones = sc.textFile("hdfs://host:port/data/cartesian_data/t1b") //1 line 1 column val twos = sc.textFile("hdfs://host:port/data/cartesian_data/t2b") //49 line 3 column val cartesian = new CartesianRDD(sc, twos, ones) cartesian.count() running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use less than 10 seconds was: spark sql running time is too long while input left table and right table is small text format data, the sql is: select * from t1 cross join t2 the line of t1 is 49, three column the line of t2 is 1, one column only running more than 30mins and then failed spark CartesianRDD also has the same problem, example test code is: val ones = sc.textFile("file:///Users/moses/4paradigm/data/cartesian_data/t1b") //1 line 1 column val twos = sc.textFile("file:///Users/moses/4paradigm/data/cartesian_data/t2b") //49 line 3 column val cartesian = new CartesianRDD(sc, twos, ones) cartesian.count() running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use less than 10 seconds > spark sql cross join running time too long > -- > > Key: SPARK-24928 > URL: https://issues.apache.org/jira/browse/SPARK-24928 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 1.6.2 >Reporter: LIFULONG >Priority: Minor > > spark sql running time is too long while input left table and right table is > small hdfs text format data, > the sql is: select * from t1 cross join t2 > the line of t1 is 49, three column > the line of t2 is 1, one column only > running more than 30mins and then failed > > > spark CartesianRDD also has the same problem, example test code is: > val ones = sc.textFile("hdfs://host:port/data/cartesian_data/t1b") //1 line > 1 column > val twos = sc.textFile("hdfs://host:port/data/cartesian_data/t2b") //49 > line 3 column > val cartesian = new CartesianRDD(sc, twos, ones) > cartesian.count() > running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use > less than 10 seconds -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24928) spark sql cross join running time too long
[ https://issues.apache.org/jira/browse/SPARK-24928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LIFULONG updated SPARK-24928: - Priority: Minor (was: Major) > spark sql cross join running time too long > -- > > Key: SPARK-24928 > URL: https://issues.apache.org/jira/browse/SPARK-24928 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 1.6.2 >Reporter: LIFULONG >Priority: Minor > > spark sql running time is too long while input left table and right table is > small text format data, > the sql is: select * from t1 cross join t2 > the line of t1 is 49, three column > the line of t2 is 1, one column only > running more than 30mins and then failed > > > spark CartesianRDD also has the same problem, example test code is: > val ones = > sc.textFile("file:///Users/moses/4paradigm/data/cartesian_data/t1b") //1 > line 1 column > val twos = > sc.textFile("file:///Users/moses/4paradigm/data/cartesian_data/t2b") > //49 line 3 column > val cartesian = new CartesianRDD(sc, twos, ones) > cartesian.count() > running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use > less than 10 seconds -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24928) spark sql cross join running time too long
LIFULONG created SPARK-24928: Summary: spark sql cross join running time too long Key: SPARK-24928 URL: https://issues.apache.org/jira/browse/SPARK-24928 Project: Spark Issue Type: Bug Components: Optimizer Affects Versions: 1.6.2 Reporter: LIFULONG spark sql running time is too long while input left table and right table is small text format data, the sql is: select * from t1 cross join t2 the line of t1 is 49, three column the line of t2 is 1, one column only running more than 30mins and then failed spark CartesianRDD also has the same problem, example test code is: val ones = sc.textFile("file:///Users/moses/4paradigm/data/cartesian_data/t1b") //1 line 1 column val twos = sc.textFile("file:///Users/moses/4paradigm/data/cartesian_data/t2b") //49 line 3 column val cartesian = new CartesianRDD(sc, twos, ones) cartesian.count() running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use less than 10 seconds -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org