[jira] [Updated] (SPARK-37640) rolled event log still need be clean after compact

2022-01-09 Thread muhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

muhong updated SPARK-37640:
---
Description: 
when we set "{{{}spark.eventLog.rolling.enabled{}}} =true", the eventlog will 
be roll and compact(when set "spark.eventLog.compression.codec"), the directory 
tree like this

root dir: /spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1

file in dir:

 
/spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd

 
/spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd

 
/spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd

..

..

 

a "long run" spark application, the history server will not clean the 
'events__application_xxx__1.zstd' file in 
/spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1, so the size of 
directory will be bigger and bigger during the whole lifetime of app. 

so i think we should provide a mechanism for user to clean the 
“events__application_xxx__1.zstd” file in 
/spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1 directory

 

our solution:add a clean function in 
“[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#checkForLogs]”,this
 function will list the file in 
“/spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1” and clean the 
“events__application_xxx__1.zstd” file according to the config 
"spark.history.fs.cleaner.maxAge". this solve the unlimited space increase ,but 
will loss some event,especially the start event,this will lead the history can 
not show the eventlog correctly。

 

so we will will have a much more proper way to solve this

  was:
when we set "{{{}spark.eventLog.rolling.enabled{}}} =true", the eventlog will 
be roll and compact(when set "spark.eventLog.compression.codec"), the directory 
tree like this

root dir: /spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1

file in dir:

 
/spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd

 
/spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd

 
/spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd

..

..

 

a "long run" spark application, the history server will not clean the 
'events__application_xxx__1.zstd' file in 
/spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1, so the size of 
directory will be bigger and bigger during the whole lifetime of app. 

so i think we should provide a mechanism for user to clean the 
“events__application_xxx__1.zstd” file in 
/spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1 directory

 

our solution:add a clean function in 
“https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#checkForLogs”,this
 function will list the file in 
“/spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1” and clean the 
“events__application_xxx__1.zstd” file according to the config 
"spark.history.fs.cleaner.maxAge"


> rolled event log still need be clean after compact
> --
>
> Key: SPARK-37640
> URL: https://issues.apache.org/jira/browse/SPARK-37640
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: muhong
>Priority: Major
>
> when we set "{{{}spark.eventLog.rolling.enabled{}}} =true", the eventlog will 
> be roll and compact(when set "spark.eventLog.compression.codec"), the 
> directory tree like this
> root dir: /spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1
> file in dir:
>  
> /spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd
>  
> /spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd
>  
> /spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd
> ..
> ..
>  
> a "long run" spark application, the history server will not clean the 
> 'events__application_xxx__1.zstd' file in 
> /spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1, so the size 
> of directory will be bigger and bigger during the whole lifetime of app. 
> so i think we should provide a mechanism for user to clean the 
> “events__application_xxx__1.zstd” file in 
> /spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1 directory
>  
> our solution:add a clean function in 
> 

[jira] [Updated] (SPARK-37640) rolled event log still need be clean after compact

2022-01-09 Thread muhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

muhong updated SPARK-37640:
---
Description: 
when we set "{{{}spark.eventLog.rolling.enabled{}}} =true", the eventlog will 
be roll and compact(when set "spark.eventLog.compression.codec"), the directory 
tree like this

root dir: /spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1

file in dir:

 
/spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd

 
/spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd

 
/spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd

..

..

 

a "long run" spark application, the history server will not clean the 
'events__application_xxx__1.zstd' file in 
/spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1, so the size of 
directory will be bigger and bigger during the whole lifetime of app. 

so i think we should provide a mechanism for user to clean the 
“events__application_xxx__1.zstd” file in 
/spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1 directory

 

our solution:add a clean function in 
“https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#checkForLogs”,this
 function will list the file in 
“/spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1” and clean the 
“events__application_xxx__1.zstd” file according to the config 
"spark.history.fs.cleaner.maxAge"

  was:
when we set "{{{}spark.eventLog.rolling.enabled{}}} =true", the eventlog will 
be roll and compact(when set "spark.eventLog.compression.codec"), the directory 
tree like this

root dir: /spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1

file in dir:

 
/spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd

 
/spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd

 
/spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd

..

..

 

a "long run" spark application, the history server will not clean the 
'events__application_xxx__1.zstd' file in 
/spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1, so the size of 
directory will be bigger and bigger during the whole lifetime of app. 

so i think we should provide a mechanism for user to clean the 
“events__application_xxx__1.zstd” file in 
/spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1 directory

 


> rolled event log still need be clean after compact
> --
>
> Key: SPARK-37640
> URL: https://issues.apache.org/jira/browse/SPARK-37640
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: muhong
>Priority: Major
>
> when we set "{{{}spark.eventLog.rolling.enabled{}}} =true", the eventlog will 
> be roll and compact(when set "spark.eventLog.compression.codec"), the 
> directory tree like this
> root dir: /spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1
> file in dir:
>  
> /spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd
>  
> /spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd
>  
> /spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd
> ..
> ..
>  
> a "long run" spark application, the history server will not clean the 
> 'events__application_xxx__1.zstd' file in 
> /spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1, so the size 
> of directory will be bigger and bigger during the whole lifetime of app. 
> so i think we should provide a mechanism for user to clean the 
> “events__application_xxx__1.zstd” file in 
> /spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1 directory
>  
> our solution:add a clean function in 
> “https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#checkForLogs”,this
>  function will list the file in 
> “/spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1” and clean 
> the “events__application_xxx__1.zstd” file according to the 
> config "spark.history.fs.cleaner.maxAge"



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37640) rolled event log still need be clean after compact

2022-01-09 Thread muhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

muhong updated SPARK-37640:
---
Description: 
when we set "{{{}spark.eventLog.rolling.enabled{}}} =true", the eventlog will 
be roll and compact(when set "spark.eventLog.compression.codec"), the directory 
tree like this

root dir: /spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1

file in dir:

 
/spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd

 
/spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd

 
/spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd

..

..

 

a "long run" spark application, the history server will not clean the 
'events__application_xxx__1.zstd' file in 
/spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1, so the size of 
directory will be bigger and bigger during the whole lifetime of app. 

so i think we should provide a mechanism for user to clean the 
“events__application_xxx__1.zstd” file in 
/spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1 directory

 

  was:when


> rolled event log still need be clean after compact
> --
>
> Key: SPARK-37640
> URL: https://issues.apache.org/jira/browse/SPARK-37640
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: muhong
>Priority: Major
>
> when we set "{{{}spark.eventLog.rolling.enabled{}}} =true", the eventlog will 
> be roll and compact(when set "spark.eventLog.compression.codec"), the 
> directory tree like this
> root dir: /spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1
> file in dir:
>  
> /spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd
>  
> /spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd
>  
> /spark2xJobHistory2x/eventlog_v2_application__xxx_1/events__application_xxx__1.zstd
> ..
> ..
>  
> a "long run" spark application, the history server will not clean the 
> 'events__application_xxx__1.zstd' file in 
> /spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1, so the size 
> of directory will be bigger and bigger during the whole lifetime of app. 
> so i think we should provide a mechanism for user to clean the 
> “events__application_xxx__1.zstd” file in 
> /spark2xJobHistory2x/eventlog_v2_application_xxx_xxx_1 directory
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37640) rolled event log still need be clean after compact

2022-01-09 Thread muhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

muhong updated SPARK-37640:
---
Description: when

> rolled event log still need be clean after compact
> --
>
> Key: SPARK-37640
> URL: https://issues.apache.org/jira/browse/SPARK-37640
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: muhong
>Priority: Major
>
> when



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37640) rolled event log still need be clean after compact

2022-01-09 Thread muhong (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471684#comment-17471684
 ] 

muhong commented on SPARK-37640:


ok,i will add a more detail description

> rolled event log still need be clean after compact
> --
>
> Key: SPARK-37640
> URL: https://issues.apache.org/jira/browse/SPARK-37640
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: muhong
>Priority: Major
>
> when



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37821) spark thrift server RDD ID overflow lead sql execute failed

2022-01-09 Thread muhong (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471681#comment-17471681
 ] 

muhong commented on SPARK-37821:


yes,just the modification is much complicated;

> spark thrift server RDD ID overflow lead sql execute failed
> ---
>
> Key: SPARK-37821
> URL: https://issues.apache.org/jira/browse/SPARK-37821
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: muhong
>Priority: Major
>
> this problem will happen in long run spark application,such as thrift server;
> as only one SparkContext instance in thrift server driver size,so if the 
> concurrency of sql request is large or the sql is too complicate(this will 
> create a lot of rdd), the rdd will be generate too fast , the rdd id 
> (SparkContext.scala#nextRddId:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala]
>  )will be consume fast, after a few months the nextRddId will overflow。the 
> newRddId will be negative number,but the rdd's block id need to be positive, 
> so this will lead a exception"Failed to parse rdd_-2123452330_2 into block 
> ID"(rdd block id formate“val RDD = 
> "rdd_([0-9]{+})_([0-9]{+})".r”:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockId.scala]),so
>  can not exchange data during sql execution, and lead sql execute failed
> if rddId overflow , when rdd.MapPartition execute , error will occur, the 
> error is occur on driver side, when driver deserialize block id from "block 
> message" inputstream
> when executor invoke rdd.MapPartition, it will call block manager to report 
> block status,  the the block id is negative,when the message send back to 
> driver , the driver regex will failed match and throw an exception
>  
> how to fix the problem???
> SparkContext.scala
>  
> {code:java}
> ...
> ...
>  private val nextShuffleId = new AtomicInteger(0)
>   private[spark] def newShuffleId(): Int = nextShuffleId.getAndIncrement()
>   private var nextRddId = new AtomicInteger(0) // change happen
>   /** Register a new RDD, returning its RDD ID */  
> // change happen
> private[spark] def newRddId(): Int = {
>   var id = nextRddId.getAndIncrement()
>   if (id > 0) {
> return id
>   }
>   this.synchronized {
> id = nextRddId.getAndIncrement()
>    if (id < 0) {       
>   nextRddId = new AtomicInteger(0)
>   id = nextRddId.getAndIncrement()
> }
>   }
>   id
> }
> ...
> ...{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37821) spark thrift server RDD ID overflow lead sql execute failed

2022-01-06 Thread muhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

muhong updated SPARK-37821:
---
Description: 
this problem will happen in long run spark application,such as thrift server;

as only one SparkContext instance in thrift server driver size,so if the 
concurrency of sql request is large or the sql is too complicate(this will 
create a lot of rdd), the rdd will be generate too fast , the rdd id 
(SparkContext.scala#nextRddId:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala]
 )will be consume fast, after a few months the nextRddId will overflow。the 
newRddId will be negative number,but the rdd's block id need to be positive, so 
this will lead a exception"Failed to parse rdd_-2123452330_2 into block ID"(rdd 
block id formate“val RDD = 
"rdd_([0-9]{+})_([0-9]{+})".r”:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockId.scala]),so
 can not exchange data during sql execution, and lead sql execute failed

if rddId overflow , when rdd.MapPartition execute , error will occur, the error 
is occur on driver side, when driver deserialize block id from "block message" 
inputstream

when executor invoke rdd.MapPartition, it will call block manager to report 
block status,  the the block id is negative,when the message send back to 
driver , the driver regex will failed match and throw an exception

 

how to fix the problem???

SparkContext.scala

 
{code:java}
...
...
 private val nextShuffleId = new AtomicInteger(0)
  private[spark] def newShuffleId(): Int = nextShuffleId.getAndIncrement()
  private var nextRddId = new AtomicInteger(0) // change happen
  /** Register a new RDD, returning its RDD ID */  
// change happen
private[spark] def newRddId(): Int = {
  var id = nextRddId.getAndIncrement()
  if (id > 0) {
return id
  }
  this.synchronized {
id = nextRddId.getAndIncrement()
   if (id < 0) {       
  nextRddId = new AtomicInteger(0)
  id = nextRddId.getAndIncrement()
}
  }
  id
}
...
...{code}

  was:
this problem will happen in long run spark application,such as thrift server;

as only one SparkContext instance in thrift server driver size,so if the 
concurrency of sql request is large or the sql is too complicate(this will 
create a lot of rdd), the rdd will be generate too fast , the rdd id 
(SparkContext.scala#nextRddId:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala]
 )will be consume fast, after a few months the nextRddId will overflow。the 
newRddId will be negative number,but the rdd's block id need to be positive, so 
this will lead a exception"Failed to parse rdd_-2123452330_2 into block ID"(rdd 
block id formate“val RDD = 
"rdd_([0-9]{+})_([0-9]{+})".r”:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockId.scala]),so
 can not exchange data during sql execution, and lead sql execute failed

 

if rddId overflow , when rdd.MapPartition execute , error will occure

 

how to fix the problem???

SparkContext.scala

 
{code:java}
...
...
 private val nextShuffleId = new AtomicInteger(0)
  private[spark] def newShuffleId(): Int = nextShuffleId.getAndIncrement()
  private var nextRddId = new AtomicInteger(0) // change happen
  /** Register a new RDD, returning its RDD ID */  
// change happen
private[spark] def newRddId(): Int = {
  var id = nextRddId.getAndIncrement()
  if (id > 0) {
return id
  }
  this.synchronized {
id = nextRddId.getAndIncrement()
   if (id < 0) {       
  nextRddId = new AtomicInteger(0)
  id = nextRddId.getAndIncrement()
}
  }
  id
}
...
...{code}


> spark thrift server RDD ID overflow lead sql execute failed
> ---
>
> Key: SPARK-37821
> URL: https://issues.apache.org/jira/browse/SPARK-37821
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: muhong
>Priority: Major
>
> this problem will happen in long run spark application,such as thrift server;
> as only one SparkContext instance in thrift server driver size,so if the 
> concurrency of sql request is large or the sql is too complicate(this will 
> create a lot of rdd), the rdd will be generate too fast , the rdd id 
> (SparkContext.scala#nextRddId:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala]
>  )will be consume fast, after a few months the nextRddId will overflow。the 
> newRddId will be negative number,but the rdd's block id need to be positive, 
> so this will lead a exception"Failed to parse rdd_-2123452330_2 into block 
> ID"(rdd block id formate“val RDD = 
> 

[jira] [Updated] (SPARK-37821) spark thrift server RDD ID overflow lead sql execute failed

2022-01-06 Thread muhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

muhong updated SPARK-37821:
---
Description: 
this problem will happen in long run spark application,such as thrift server;

as only one SparkContext instance in thrift server driver size,so if the 
concurrency of sql request is large or the sql is too complicate(this will 
create a lot of rdd), the rdd will be generate too fast , the rdd id 
(SparkContext.scala#nextRddId:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala]
 )will be consume fast, after a few months the nextRddId will overflow。the 
newRddId will be negative number,but the rdd's block id need to be positive, so 
this will lead a exception"Failed to parse rdd_-2123452330_2 into block ID"(rdd 
block id formate“val RDD = 
"rdd_([0-9]{+})_([0-9]{+})".r”:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockId.scala]),so
 can not exchange data during sql execution, and lead sql execute failed

 

if rddId overflow , when rdd.MapPartition execute , error will occure

 

how to fix the problem???

SparkContext.scala

 
{code:java}
...
...
 private val nextShuffleId = new AtomicInteger(0)
  private[spark] def newShuffleId(): Int = nextShuffleId.getAndIncrement()
  private var nextRddId = new AtomicInteger(0) // change happen
  /** Register a new RDD, returning its RDD ID */  
// change happen
private[spark] def newRddId(): Int = {
  var id = nextRddId.getAndIncrement()
  if (id > 0) {
return id
  }
  this.synchronized {
id = nextRddId.getAndIncrement()
   if (id < 0) {       
  nextRddId = new AtomicInteger(0)
  id = nextRddId.getAndIncrement()
}
  }
  id
}
...
...{code}

  was:
this problem will happen in long run spark application,such as thrift server;

as only one SparkContext instance in thrift server driver size,so if the 
concurrency of sql request is large or the sql is too complicate(this will 
create a lot of rdd), the rdd will be generate too fast , the rdd id 
(SparkContext.scala#nextRddId:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala]
 )will be consume fast, after a few months the nextRddId will overflow。the 
newRddId will be negative number,but the rdd's block id need to be positive, so 
this will lead a exception"Failed to parse rdd_-2123452330_2 into block ID"(rdd 
block id formate“val RDD = 
"rdd_([0-9]{+})_([0-9]{+})".r”:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockId.scala]),so
 can not exchange data during sql execution, and lead sql execute failed

 

if rddId overflow , when rdd.MapPartition execute , error will occure

 

how to fix the problem???

SparkContext.scala

 
{code:java}
...
...
 private val nextShuffleId = new AtomicInteger(0)
  private[spark] def newShuffleId(): Int = nextShuffleId.getAndIncrement()
  private val nextRddId = new AtomicInteger(0)
  /** Register a new RDD, returning its RDD ID */  private[spark] def 
newRddId(): Int = nextRddId.getAndIncrement()
  /**   * Registers listeners specified in spark.extraListeners, then starts 
the listener bus.   * This should be called after all internal listeners have 
been registered with the listener bus   * (e.g. after the web UI and event 
logging listeners have been registered).   */  private def 
setupAndStartListenerBus(): Unit = {try {  
conf.get(EXTRA_LISTENERS).foreach { classNames =>val listeners = 
Utils.loadExtensions(classOf[SparkListenerInterface], classNames, conf)
listeners.foreach { listener =>  listenerBus.addToSharedQueue(listener) 
 logInfo(s"Registered listener ${listener.getClass().getName()}")   
 }  }} catch {  case e: Exception =>try {  stop()   
 } finally {  throw new SparkException(s"Exception when registering 
SparkListener", e)}}
listenerBus.start(this, _env.metricsSystem)_listenerBusStarted = true  
} 

...
...{code}


> spark thrift server RDD ID overflow lead sql execute failed
> ---
>
> Key: SPARK-37821
> URL: https://issues.apache.org/jira/browse/SPARK-37821
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: muhong
>Priority: Major
>
> this problem will happen in long run spark application,such as thrift server;
> as only one SparkContext instance in thrift server driver size,so if the 
> concurrency of sql request is large or the sql is too complicate(this will 
> create a lot of rdd), the rdd will be generate too fast , the rdd id 
> (SparkContext.scala#nextRddId:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala]
>  )will be 

[jira] [Updated] (SPARK-37821) spark thrift server RDD ID overflow lead sql execute failed

2022-01-06 Thread muhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

muhong updated SPARK-37821:
---
Description: 
this problem will happen in long run spark application,such as thrift server;

as only one SparkContext instance in thrift server driver size,so if the 
concurrency of sql request is large or the sql is too complicate(this will 
create a lot of rdd), the rdd will be generate too fast , the rdd id 
(SparkContext.scala#nextRddId:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala]
 )will be consume fast, after a few months the nextRddId will overflow。the 
newRddId will be negative number,but the rdd's block id need to be positive, so 
this will lead a exception"Failed to parse rdd_-2123452330_2 into block ID"(rdd 
block id formate“val RDD = 
"rdd_([0-9]{+})_([0-9]{+})".r”:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockId.scala]),so
 can not exchange data during sql execution, and lead sql execute failed

 

if rddId overflow , when rdd.MapPartition execute , error will occure

 

how to fix the problem???

SparkContext.scala

 
{code:java}
...
...
 private val nextShuffleId = new AtomicInteger(0)
  private[spark] def newShuffleId(): Int = nextShuffleId.getAndIncrement()
  private val nextRddId = new AtomicInteger(0)
  /** Register a new RDD, returning its RDD ID */  private[spark] def 
newRddId(): Int = nextRddId.getAndIncrement()
  /**   * Registers listeners specified in spark.extraListeners, then starts 
the listener bus.   * This should be called after all internal listeners have 
been registered with the listener bus   * (e.g. after the web UI and event 
logging listeners have been registered).   */  private def 
setupAndStartListenerBus(): Unit = {try {  
conf.get(EXTRA_LISTENERS).foreach { classNames =>val listeners = 
Utils.loadExtensions(classOf[SparkListenerInterface], classNames, conf)
listeners.foreach { listener =>  listenerBus.addToSharedQueue(listener) 
 logInfo(s"Registered listener ${listener.getClass().getName()}")   
 }  }} catch {  case e: Exception =>try {  stop()   
 } finally {  throw new SparkException(s"Exception when registering 
SparkListener", e)}}
listenerBus.start(this, _env.metricsSystem)_listenerBusStarted = true  
} 

...
...{code}

  was:
this problem will happen in long run spark application,such as thrift server;

as only one SparkContext instance in thrift server driver size,so if the 
concurrency of sql request is large or the sql is too complicate(this will 
create a lot of rdd), the rdd will be generate too fast , the rdd id 
(SparkContext.scala#nextRddId:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala]
 )will be consume fast, after a few months the nextRddId will overflow。the 
newRddId will be negative number,but the rdd's block id need to be positive, so 
this will lead a exception"Failed to parse rdd_-2123452330_2 into block ID"(rdd 
block id formate“val RDD = 
"rdd_([0-9]{+})_([0-9]{+})".r”:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockId.scala]),so
 can not exchange data during sql execution, and lead sql execute failed

 

if rddId overflow , when rdd.MapPartition execute , error will occure


> spark thrift server RDD ID overflow lead sql execute failed
> ---
>
> Key: SPARK-37821
> URL: https://issues.apache.org/jira/browse/SPARK-37821
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: muhong
>Priority: Major
>
> this problem will happen in long run spark application,such as thrift server;
> as only one SparkContext instance in thrift server driver size,so if the 
> concurrency of sql request is large or the sql is too complicate(this will 
> create a lot of rdd), the rdd will be generate too fast , the rdd id 
> (SparkContext.scala#nextRddId:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala]
>  )will be consume fast, after a few months the nextRddId will overflow。the 
> newRddId will be negative number,but the rdd's block id need to be positive, 
> so this will lead a exception"Failed to parse rdd_-2123452330_2 into block 
> ID"(rdd block id formate“val RDD = 
> "rdd_([0-9]{+})_([0-9]{+})".r”:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockId.scala]),so
>  can not exchange data during sql execution, and lead sql execute failed
>  
> if rddId overflow , when rdd.MapPartition execute , error will occure
>  
> how to fix the problem???
> SparkContext.scala
>  
> {code:java}
> ...
> 

[jira] [Updated] (SPARK-37821) spark thrift server RDD ID overflow lead sql execute failed

2022-01-06 Thread muhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

muhong updated SPARK-37821:
---
Description: 
this problem will happen in long run spark application,such as thrift server;

as only one SparkContext instance in thrift server driver size,so if the 
concurrency of sql request is large or the sql is too complicate(this will 
create a lot of rdd), the rdd will be generate too fast , the rdd id 
(SparkContext.scala#nextRddId:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala]
 )will be consume fast, after a few months the nextRddId will overflow。the 
newRddId will be negative number,but the rdd's block id need to be positive, so 
this will lead a exception"Failed to parse rdd_-2123452330_2 into block ID"(rdd 
block id formate“val RDD = 
"rdd_([0-9]{+})_([0-9]{+})".r”:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockId.scala]),so
 can not exchange data during sql execution, and lead sql execute failed

 

if rddId overflow , when rdd.MapPartition execute , error will occure

  was:
this problem will happen in long run spark application,such as thrift server;

as only one SparkContext instance in thrift server driver size,so if the 
concurrency of sql request is large or the sql is too complicate(this will 
create a lot of rdd), the rdd will be generate too fast , the rdd id 
(SparkContext.scala#nextRddId:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala]
 )will be consume fast, after a few months the nextRddId will overflow。the 
newRddId will be negative number,but the rdd's block id need to be positive, so 
this will lead a exception"Failed to parse rdd_-2123452330_2 into block ID"(rdd 
block id formate“val RDD = 
"rdd_([0-9]+)_([0-9]+)".r”:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockId.scala]),so
 can not exchange data during sql execution, and lead sql execute failed


> spark thrift server RDD ID overflow lead sql execute failed
> ---
>
> Key: SPARK-37821
> URL: https://issues.apache.org/jira/browse/SPARK-37821
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: muhong
>Priority: Major
>
> this problem will happen in long run spark application,such as thrift server;
> as only one SparkContext instance in thrift server driver size,so if the 
> concurrency of sql request is large or the sql is too complicate(this will 
> create a lot of rdd), the rdd will be generate too fast , the rdd id 
> (SparkContext.scala#nextRddId:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala]
>  )will be consume fast, after a few months the nextRddId will overflow。the 
> newRddId will be negative number,but the rdd's block id need to be positive, 
> so this will lead a exception"Failed to parse rdd_-2123452330_2 into block 
> ID"(rdd block id formate“val RDD = 
> "rdd_([0-9]{+})_([0-9]{+})".r”:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockId.scala]),so
>  can not exchange data during sql execution, and lead sql execute failed
>  
> if rddId overflow , when rdd.MapPartition execute , error will occure



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37821) spark thrift server RDD ID overflow lead sql execute failed

2022-01-05 Thread muhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

muhong updated SPARK-37821:
---
Description: 
this problem will happen in long run spark application,such as thrift server;

as only one SparkContext instance in thrift server driver size,so if the 
concurrency of sql request is large or the sql is too complicate(this will 
create a lot of rdd), the rdd will be generate too fast , the rdd id 
(SparkContext.scala#nextRddId:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala]
 )will be consume fast, after a few months the nextRddId will overflow。the 
newRddId will be negative number,but the rdd's block id need to be positive, so 
this will lead a exception"Failed to parse rdd_-2123452330_2 into block ID"(rdd 
block id formate“val RDD = 
"rdd_([0-9]+)_([0-9]+)".r”:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockId.scala]),so
 can not exchange data during sql execution, and lead sql execute failed

  was:
this problem will happen in long run spark application,such as thrift server;

as only one SparkContext instance in thrift server driver size,so if the 
concurrency of sql request is large or the sql is too complicate(this will 
create a lot of rdd), the rdd will be generate too fast , the rdd id 
(SparkContext.scala#nextRddId:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala]
 )will be consume fast, after a few months the nextRddId will overflow。the 
newRddId will be negative number,but the rdd's block id need to be positive, so 
this will lead a exception"Failed to parse rdd_-2123452330_2 into block ID",so 
can not exchange data during sql execution, and lead sql execute failed


> spark thrift server RDD ID overflow lead sql execute failed
> ---
>
> Key: SPARK-37821
> URL: https://issues.apache.org/jira/browse/SPARK-37821
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: muhong
>Priority: Major
>
> this problem will happen in long run spark application,such as thrift server;
> as only one SparkContext instance in thrift server driver size,so if the 
> concurrency of sql request is large or the sql is too complicate(this will 
> create a lot of rdd), the rdd will be generate too fast , the rdd id 
> (SparkContext.scala#nextRddId:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala]
>  )will be consume fast, after a few months the nextRddId will overflow。the 
> newRddId will be negative number,but the rdd's block id need to be positive, 
> so this will lead a exception"Failed to parse rdd_-2123452330_2 into block 
> ID"(rdd block id formate“val RDD = 
> "rdd_([0-9]+)_([0-9]+)".r”:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockId.scala]),so
>  can not exchange data during sql execution, and lead sql execute failed



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37821) spark thrift server RDD ID overflow lead sql execute failed

2022-01-05 Thread muhong (Jira)
muhong created SPARK-37821:
--

 Summary: spark thrift server RDD ID overflow lead sql execute 
failed
 Key: SPARK-37821
 URL: https://issues.apache.org/jira/browse/SPARK-37821
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: muhong


this problem will happen in long run spark application,such as thrift server;

as only one SparkContext instance in thrift server driver size,so if the 
concurrency of sql request is large or the sql is too complicate(this will 
create a lot of rdd), the rdd will be generate too fast , the rdd id 
(SparkContext.scala#nextRddId:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala]
 )will be consume fast, after a few months the nextRddId will overflow。the 
newRddId will be negative number,but the rdd's block id need to be positive, so 
this will lead a exception"Failed to parse rdd_-2123452330_2 into block ID",so 
can not exchange data during sql execution, and lead sql execute failed



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-37639) spark history server clean event log directory with out check status file

2021-12-14 Thread muhong (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459673#comment-17459673
 ] 

muhong edited comment on SPARK-37639 at 12/15/21, 6:31 AM:
---

to solve the problem, i modify the logic in EventLogFileWriters : when invoke 
method rollEventLogFile, check whether the status file exist first, if not 
recreate it;  it works.

 
{code:java}
/** exposed for testing only */
  private[history] def rollEventLogFile(): Unit = {
  // check wether the status file exist, if not recreate it 
   val isExist = checkAppStatusFileExist(true)
   if(!isExist){
   createAppStatusFile(inProgress = true)
   }
     closeWriter()    
  index += 1
    currentEventLogFilePath = getEventLogFilePath(logDirForAppPath, appId, 
appAttemptId, index,
      compressionCodecName)    initLogFile(currentEventLogFilePath) { os =>
      countingOutputStream = Some(new CountingOutputStream(os))
      new PrintWriter(
        new OutputStreamWriter(countingOutputStream.get, 
StandardCharsets.UTF_8))
    }
  } 

// new check method
private def checkAppStatusFileExist(inProgress:Boolean):Boolean = {
val appStatusPath = getAppStatusFilePath(logDirForAppPath, appId, 
appAttempId, inProgress)
val isExist = fileSystem.exists(appStatusPath)
isExist
}{code}
 

why i choose to modify like this, rather than change the logic in history 
server, because the long run spark thrift server might abnormal exit without 
change the status file,  but this directory still need be delete.

during the testing, i found another problem, if the thrift server normal exit 
with out any eventlog,  the history can not delete this kind of directory, 
because the history server will thrown illegalArgumentException“directory must 
have at least one eventlog file”。


was (Author: m-sir):
to solve the problem, i modify the logic in EventLogFileWriters : when invoke 
method rollEventLogFile, check whether the status file exist first, if not 
recreate it;  it works.

 
{code:java}
/** exposed for testing only */
  private[history] def rollEventLogFile(): Unit = {
  // check wether the status file exist, if not recreate it 

    closeWriter()    
  index += 1
    currentEventLogFilePath = getEventLogFilePath(logDirForAppPath, appId, 
appAttemptId, index,
      compressionCodecName)    initLogFile(currentEventLogFilePath) { os =>
      countingOutputStream = Some(new CountingOutputStream(os))
      new PrintWriter(
        new OutputStreamWriter(countingOutputStream.get, 
StandardCharsets.UTF_8))
    }
  } {code}
 

why i choose to modify like this, rather than change the logic in history 
server, because the long run spark thrift server might abnormal exit without 
change the status file,  but this directory still need be delete.


during the testing, i found another problem, if the thrift server normal exit 
with out any eventlog,  the history can not delete this kind of directory, 
because the history server will thrown illegalArgumentException“directory must 
have at least one eventlog file”。

> spark history server clean event log directory with out check status file
> -
>
> Key: SPARK-37639
> URL: https://issues.apache.org/jira/browse/SPARK-37639
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: muhong
>Priority: Major
>
> i foud a problem, the thrift server create event log file(.inprogress file 
> create at init), and history server clean the application event log file 
> according size and modtime. so there is a potential problem under this 
> situation
> *if the thrift server accept no quest long time(longer than time config by 
> spark.history.fs.cleaner.maxAge), the history server will clean  the 
> applicaiton log [directory] with the inprogress file; after clean  the thrift 
> server accept a lot of request ,and will generate new event log directory 
> without inprogress status file, and the director will never be clean by 
> history server because it not contain status file. this will leads spack leak*
> i think whenever create new log file , need to check wether the status file 
> is exist, if not create it
> last i think extra function need add, like log4j the compact file stii need 
> to be clean after a period(config by user),so ,long run spark service like 
> thrift server‘s event log file space can be limit in a config size



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37639) spark history server clean event log directory with out check status file

2021-12-14 Thread muhong (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459673#comment-17459673
 ] 

muhong commented on SPARK-37639:


to solve the problem, i modify the logic in EventLogFileWriters : when invoke 
method rollEventLogFile, check whether the status file exist first, if not 
recreate it;  it works.

 
{code:java}
/** exposed for testing only */
  private[history] def rollEventLogFile(): Unit = {
  // check wether the status file exist, if not recreate it 

    closeWriter()    
  index += 1
    currentEventLogFilePath = getEventLogFilePath(logDirForAppPath, appId, 
appAttemptId, index,
      compressionCodecName)    initLogFile(currentEventLogFilePath) { os =>
      countingOutputStream = Some(new CountingOutputStream(os))
      new PrintWriter(
        new OutputStreamWriter(countingOutputStream.get, 
StandardCharsets.UTF_8))
    }
  } {code}
 

why i choose to modify like this, rather than change the logic in history 
server, because the long run spark thrift server might abnormal exit without 
change the status file,  but this directory still need be delete.


during the testing, i found another problem, if the thrift server normal exit 
with out any eventlog,  the history can not delete this kind of directory, 
because the history server will thrown illegalArgumentException“directory must 
have at least one eventlog file”。

> spark history server clean event log directory with out check status file
> -
>
> Key: SPARK-37639
> URL: https://issues.apache.org/jira/browse/SPARK-37639
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: muhong
>Priority: Major
>
> i foud a problem, the thrift server create event log file(.inprogress file 
> create at init), and history server clean the application event log file 
> according size and modtime. so there is a potential problem under this 
> situation
> *if the thrift server accept no quest long time(longer than time config by 
> spark.history.fs.cleaner.maxAge), the history server will clean  the 
> applicaiton log [directory] with the inprogress file; after clean  the thrift 
> server accept a lot of request ,and will generate new event log directory 
> without inprogress status file, and the director will never be clean by 
> history server because it not contain status file. this will leads spack leak*
> i think whenever create new log file , need to check wether the status file 
> is exist, if not create it
> last i think extra function need add, like log4j the compact file stii need 
> to be clean after a period(config by user),so ,long run spark service like 
> thrift server‘s event log file space can be limit in a config size



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37639) spark history server clean event log directory with out check status file

2021-12-13 Thread muhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

muhong updated SPARK-37639:
---
Description: 
i foud a problem, the thrift server create event log file(.inprogress file 
create at init), and history server clean the application event log file 
according size and modtime. so there is a potential problem under this situation

*if the thrift server accept no quest long time(longer than time config by 
spark.history.fs.cleaner.maxAge), the history server will clean  the 
applicaiton log [directory] with the inprogress file; after clean  the thrift 
server accept a lot of request ,and will generate new event log directory 
without inprogress status file, and the director will never be clean by history 
server because it not contain status file. this will leads spack leak*

i think whenever create new log file , need to check wether the status file is 
exist, if not create it

last i think extra function need add, like log4j the compact file stii need to 
be clean after a period(config by user),so ,long run spark service like thrift 
server‘s event log file space can be limit in a config size

> spark history server clean event log directory with out check status file
> -
>
> Key: SPARK-37639
> URL: https://issues.apache.org/jira/browse/SPARK-37639
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: muhong
>Priority: Major
>
> i foud a problem, the thrift server create event log file(.inprogress file 
> create at init), and history server clean the application event log file 
> according size and modtime. so there is a potential problem under this 
> situation
> *if the thrift server accept no quest long time(longer than time config by 
> spark.history.fs.cleaner.maxAge), the history server will clean  the 
> applicaiton log [directory] with the inprogress file; after clean  the thrift 
> server accept a lot of request ,and will generate new event log directory 
> without inprogress status file, and the director will never be clean by 
> history server because it not contain status file. this will leads spack leak*
> i think whenever create new log file , need to check wether the status file 
> is exist, if not create it
> last i think extra function need add, like log4j the compact file stii need 
> to be clean after a period(config by user),so ,long run spark service like 
> thrift server‘s event log file space can be limit in a config size



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37640) rolled event log still need be clean after compact

2021-12-13 Thread muhong (Jira)
muhong created SPARK-37640:
--

 Summary: rolled event log still need be clean after compact
 Key: SPARK-37640
 URL: https://issues.apache.org/jira/browse/SPARK-37640
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.1.1
Reporter: muhong






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37639) spark history server clean event log directory with out check status file

2021-12-13 Thread muhong (Jira)
muhong created SPARK-37639:
--

 Summary: spark history server clean event log directory with out 
check status file
 Key: SPARK-37639
 URL: https://issues.apache.org/jira/browse/SPARK-37639
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.1
Reporter: muhong






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28869) Roll over event log files

2021-12-13 Thread muhong (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458931#comment-17458931
 ] 

muhong commented on SPARK-28869:


i foud a problem, the thrift server create event log file(.inprogress file 
create at init), and history server clean the application event log file 
according size and modtime. so there is a potential problem under this situation

*if the thrift server accept no quest long time(longer than time config by 
spark.history.fs.cleaner.maxAge), the history server will clean  the 
applicaiton log [directory] with the inprogress file; after clean  the thrift 
server accept a lot of request ,and will generate new event log directory 
without inprogress status file, and the director will never be clean by history 
server because it not contain status file. this will leads spack leak*

i think whenever create new log file , need to check wether the status file is 
exist, if not create it

last i think extra function need add, like log4j the compact file stii need to 
be clean after a period(config by user),so ,long run spark service like thrift 
server‘s event log file space can be limit in a config size

 

> Roll over event log files
> -
>
> Key: SPARK-28869
> URL: https://issues.apache.org/jira/browse/SPARK-28869
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> This issue tracks the effort on rolling over event log files in driver and 
> let SHS replay the multiple event log files correctly.
> This issue doesn't deal with overall size of event log, as well as no 
> guarantee when deleting old event log files.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31602) memory leak of JobConf

2021-09-26 Thread muhong (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420286#comment-17420286
 ] 

muhong commented on SPARK-31602:


we meet the same question on the spark driver side,but not find the answer。

we found the leak JobConf associate inside the DistributeFileSystem,the 
DistributeFileSystem are store in the FileSystem$Cache, it seems that the 
DistributeFileSystem were not closed; 

 

> memory leak of JobConf
> --
>
> Key: SPARK-31602
> URL: https://issues.apache.org/jira/browse/SPARK-31602
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: angerszhu
>Priority: Major
>  Labels: bulk-closed
> Attachments: image-2020-04-29-14-34-39-496.png, 
> image-2020-04-29-14-35-55-986.png
>
>
> !image-2020-04-29-14-34-39-496.png!
> !image-2020-04-29-14-35-55-986.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org