[jira] [Created] (SPARK-48856) Use isolated JobArtifactSet for each spark session

2024-07-10 Thread lifulong (Jira)
lifulong created SPARK-48856:


 Summary: Use isolated JobArtifactSet for each spark session
 Key: SPARK-48856
 URL: https://issues.apache.org/jira/browse/SPARK-48856
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: lifulong


Consider multi user use same spark cluster with independent spark session, 
Kyuubi group session share level for example:
 # user may use different udf jar to register function, but may has same name 
udf in different udf jar
 # the valid udf class is always the class from the first load udf jar, this 
not meet users expections



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45570) Spark job hangs due to task launch thread failed to create

2023-10-18 Thread lifulong (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776595#comment-17776595
 ] 

lifulong commented on SPARK-45570:
--

!image-2023-10-18-18-18-36-132.png!
catch thread create exception from line "threadPool.execute(tr)", and do
execBackend.statusUpdate(taskDescription.taskId, TaskState.FAILED, 
EMPTY_BYTE_BUFFER)
after get exception can fix this problem in theory
is this solution ok?

> Spark job hangs due to task launch thread failed to create
> --
>
> Key: SPARK-45570
> URL: https://issues.apache.org/jira/browse/SPARK-45570
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2, 3.5.0
> Environment: spark.speculation is use default value false
> spark version 3.1.2
>  
>Reporter: lifulong
>Priority: Major
> Attachments: image-2023-10-18-18-18-36-132.png
>
>
> spark job hangs while web ui show there is one task in running stage keep 
> running for multi hours, while other tasks finished in a few minutes 
> executor will never report task launch failed info to driver
>  
> Below is spark task execute thread launch log:
> 23/10/17 04:45:42 ERROR Inbox: An error happened while processing message in 
> the inbox for Executor
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:717)
>         at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>         at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>         at org.apache.spark.executor.Executor.launchTask(Executor.scala:270)
>         at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:173)
>         at 
> org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
>         at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
>         at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
>         at 
> org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
>         at 
> org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45570) Spark job hangs due to task launch thread failed to create

2023-10-18 Thread lifulong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lifulong updated SPARK-45570:
-
Attachment: image-2023-10-18-18-18-36-132.png

> Spark job hangs due to task launch thread failed to create
> --
>
> Key: SPARK-45570
> URL: https://issues.apache.org/jira/browse/SPARK-45570
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2, 3.5.0
> Environment: spark.speculation is use default value false
> spark version 3.1.2
>  
>Reporter: lifulong
>Priority: Major
> Attachments: image-2023-10-18-18-18-36-132.png
>
>
> spark job hangs while web ui show there is one task in running stage keep 
> running for multi hours, while other tasks finished in a few minutes 
> executor will never report task launch failed info to driver
>  
> Below is spark task execute thread launch log:
> 23/10/17 04:45:42 ERROR Inbox: An error happened while processing message in 
> the inbox for Executor
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:717)
>         at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>         at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>         at org.apache.spark.executor.Executor.launchTask(Executor.scala:270)
>         at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:173)
>         at 
> org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
>         at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
>         at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
>         at 
> org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
>         at 
> org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45570) Spark job hangs due to task launch thread failed to create

2023-10-17 Thread lifulong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lifulong updated SPARK-45570:
-
Affects Version/s: 3.5.0

> Spark job hangs due to task launch thread failed to create
> --
>
> Key: SPARK-45570
> URL: https://issues.apache.org/jira/browse/SPARK-45570
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2, 3.5.0
> Environment: spark.speculation is use default value false
>  
>Reporter: lifulong
>Priority: Major
>
> spark job hangs while web ui show there is one task in running stage keep 
> running for multi hours, while other tasks finished in a few minutes 
> executor will never report task launch failed info to driver
>  
> Below is spark task execute thread launch log:
> 23/10/17 04:45:42 ERROR Inbox: An error happened while processing message in 
> the inbox for Executor
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:717)
>         at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>         at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>         at org.apache.spark.executor.Executor.launchTask(Executor.scala:270)
>         at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:173)
>         at 
> org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
>         at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
>         at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
>         at 
> org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
>         at 
> org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45570) Spark job hangs due to task launch thread failed to create

2023-10-17 Thread lifulong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lifulong updated SPARK-45570:
-
Environment: 
spark.speculation is use default value false

spark version 3.1.2
 

  was:
spark.speculation is use default value false
 


> Spark job hangs due to task launch thread failed to create
> --
>
> Key: SPARK-45570
> URL: https://issues.apache.org/jira/browse/SPARK-45570
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2, 3.5.0
> Environment: spark.speculation is use default value false
> spark version 3.1.2
>  
>Reporter: lifulong
>Priority: Major
>
> spark job hangs while web ui show there is one task in running stage keep 
> running for multi hours, while other tasks finished in a few minutes 
> executor will never report task launch failed info to driver
>  
> Below is spark task execute thread launch log:
> 23/10/17 04:45:42 ERROR Inbox: An error happened while processing message in 
> the inbox for Executor
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:717)
>         at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>         at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>         at org.apache.spark.executor.Executor.launchTask(Executor.scala:270)
>         at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:173)
>         at 
> org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
>         at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
>         at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
>         at 
> org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
>         at 
> org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45570) Spark job hangs due to task launch thread failed to create

2023-10-17 Thread lifulong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lifulong updated SPARK-45570:
-
Description: 
spark job hangs while web ui show there is one task in running stage keep 
running for multi hours, while other tasks finished in a few minutes 

executor will never report task launch failed info to driver

 

Below is spark task execute thread launch log:

23/10/17 04:45:42 ERROR Inbox: An error happened while processing message in 
the inbox for Executor
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:717)
        at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
        at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
        at org.apache.spark.executor.Executor.launchTask(Executor.scala:270)
        at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:173)
        at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
        at 
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
        at 
org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

  was:
spark job hangs while web ui show there is one task in running stage keep 
running for multi hours, while other tasks finished in a few minutes 

 

Below is spark task execute thread launch log:

23/10/17 04:45:42 ERROR Inbox: An error happened while processing message in 
the inbox for Executor
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:717)
        at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
        at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
        at org.apache.spark.executor.Executor.launchTask(Executor.scala:270)
        at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:173)
        at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
        at 
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
        at 
org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


> Spark job hangs due to task launch thread failed to create
> --
>
> Key: SPARK-45570
> URL: https://issues.apache.org/jira/browse/SPARK-45570
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
> Environment: spark.speculation is use default value false
>  
>Reporter: lifulong
>Priority: Major
>
> spark job hangs while web ui show there is one task in running stage keep 
> running for multi hours, while other tasks finished in a few minutes 
> executor will never report task launch failed info to driver
>  
> Below is spark task execute thread launch log:
> 23/10/17 04:45:42 ERROR Inbox: An error happened while processing message in 
> the inbox for Executor
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:717)
>         at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>         at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>         at org.apache.spark.executor.Executor.launchTask(Executor.scala:270)
>         at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonf

[jira] [Created] (SPARK-45570) Spark job hangs due to task launch thread failed to create

2023-10-17 Thread lifulong (Jira)
lifulong created SPARK-45570:


 Summary: Spark job hangs due to task launch thread failed to create
 Key: SPARK-45570
 URL: https://issues.apache.org/jira/browse/SPARK-45570
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.2
 Environment: spark.speculation is use default value false
 
Reporter: lifulong


spark job hangs while web ui show there is one task in running stage keep 
running for multi hours, while other tasks finished in a few minutes 

 

Below is spark task execute thread launch log:

23/10/17 04:45:42 ERROR Inbox: An error happened while processing message in 
the inbox for Executor
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:717)
        at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
        at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
        at org.apache.spark.executor.Executor.launchTask(Executor.scala:270)
        at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:173)
        at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
        at 
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
        at 
org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24928) spark sql cross join running time too long

2018-08-03 Thread LIFULONG (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567986#comment-16567986
 ] 

LIFULONG commented on SPARK-24928:
--

for (x <- rdd1.iterator(currSplit.s1, context);
 y <- rdd2.iterator(currSplit.s2, context)) yield (x, y)

 

code from CartesianRDD.compute() method,looks like it will load right rdd from 
text for each record in left rdd.

> spark sql cross join running time too long
> --
>
> Key: SPARK-24928
> URL: https://issues.apache.org/jira/browse/SPARK-24928
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 1.6.2
>Reporter: LIFULONG
>Priority: Minor
>
> spark sql running time is too long while input left table and right table is 
> small hdfs text format data,
> the sql is:  select * from t1 cross join t2  
> the line of t1 is 49, three column
> the line of t2 is 1, one column only
> running more than 30mins and then failed
>  
>  
> spark CartesianRDD also has the same problem, example test code is:
> val ones = sc.textFile("hdfs://host:port/data/cartesian_data/t1b")  //1 line 
> 1 column
>  val twos = sc.textFile("hdfs://host:port/data/cartesian_data/t2b")  //49 
> line 3 column
>  val cartesian = new CartesianRDD(sc, twos, ones)
> cartesian.count()
> running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use 
> less than 10 seconds



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24928) spark sql cross join running time too long

2018-07-26 Thread LIFULONG (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LIFULONG updated SPARK-24928:
-
Description: 
spark sql running time is too long while input left table and right table is 
small hdfs text format data,

the sql is:  select * from t1 cross join t2  

the line of t1 is 49, three column

the line of t2 is 1, one column only

running more than 30mins and then failed

 

 

spark CartesianRDD also has the same problem, example test code is:

val ones = sc.textFile("hdfs://host:port/data/cartesian_data/t1b")  //1 line 1 
column
 val twos = sc.textFile("hdfs://host:port/data/cartesian_data/t2b")  //49 
line 3 column
 val cartesian = new CartesianRDD(sc, twos, ones)

cartesian.count()

running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use 
less than 10 seconds

  was:
spark sql running time is too long while input left table and right table is 
small text format data,

the sql is:  select * from t1 cross join t2  

the line of t1 is 49, three column

the line of t2 is 1, one column only

running more than 30mins and then failed

 

 

spark CartesianRDD also has the same problem, example test code is:

val ones = sc.textFile("file:///Users/moses/4paradigm/data/cartesian_data/t1b") 
 //1 line 1 column
 val twos = 
sc.textFile("file:///Users/moses/4paradigm/data/cartesian_data/t2b")  //49 
line 3 column
val cartesian = new CartesianRDD(sc, twos, ones)

cartesian.count()

running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use 
less than 10 seconds


> spark sql cross join running time too long
> --
>
> Key: SPARK-24928
> URL: https://issues.apache.org/jira/browse/SPARK-24928
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 1.6.2
>Reporter: LIFULONG
>Priority: Minor
>
> spark sql running time is too long while input left table and right table is 
> small hdfs text format data,
> the sql is:  select * from t1 cross join t2  
> the line of t1 is 49, three column
> the line of t2 is 1, one column only
> running more than 30mins and then failed
>  
>  
> spark CartesianRDD also has the same problem, example test code is:
> val ones = sc.textFile("hdfs://host:port/data/cartesian_data/t1b")  //1 line 
> 1 column
>  val twos = sc.textFile("hdfs://host:port/data/cartesian_data/t2b")  //49 
> line 3 column
>  val cartesian = new CartesianRDD(sc, twos, ones)
> cartesian.count()
> running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use 
> less than 10 seconds



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24928) spark sql cross join running time too long

2018-07-26 Thread LIFULONG (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LIFULONG updated SPARK-24928:
-
Priority: Minor  (was: Major)

> spark sql cross join running time too long
> --
>
> Key: SPARK-24928
> URL: https://issues.apache.org/jira/browse/SPARK-24928
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 1.6.2
>Reporter: LIFULONG
>Priority: Minor
>
> spark sql running time is too long while input left table and right table is 
> small text format data,
> the sql is:  select * from t1 cross join t2  
> the line of t1 is 49, three column
> the line of t2 is 1, one column only
> running more than 30mins and then failed
>  
>  
> spark CartesianRDD also has the same problem, example test code is:
> val ones = 
> sc.textFile("file:///Users/moses/4paradigm/data/cartesian_data/t1b")  //1 
> line 1 column
>  val twos = 
> sc.textFile("file:///Users/moses/4paradigm/data/cartesian_data/t2b")  
> //49 line 3 column
> val cartesian = new CartesianRDD(sc, twos, ones)
> cartesian.count()
> running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use 
> less than 10 seconds



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24928) spark sql cross join running time too long

2018-07-26 Thread LIFULONG (JIRA)
LIFULONG created SPARK-24928:


 Summary: spark sql cross join running time too long
 Key: SPARK-24928
 URL: https://issues.apache.org/jira/browse/SPARK-24928
 Project: Spark
  Issue Type: Bug
  Components: Optimizer
Affects Versions: 1.6.2
Reporter: LIFULONG


spark sql running time is too long while input left table and right table is 
small text format data,

the sql is:  select * from t1 cross join t2  

the line of t1 is 49, three column

the line of t2 is 1, one column only

running more than 30mins and then failed

 

 

spark CartesianRDD also has the same problem, example test code is:

val ones = sc.textFile("file:///Users/moses/4paradigm/data/cartesian_data/t1b") 
 //1 line 1 column
 val twos = 
sc.textFile("file:///Users/moses/4paradigm/data/cartesian_data/t2b")  //49 
line 3 column
val cartesian = new CartesianRDD(sc, twos, ones)

cartesian.count()

running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use 
less than 10 seconds



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org