[jira] [Updated] (SPARK-26992) Fix STS scheduler pool correct delivery

2019-02-25 Thread dzcxzl (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-26992:
---
Description: 
The user sets the value of spark.sql.thriftserver.scheduler.pool.
 Spark thrift server saves this value in the LocalProperty of threadlocal type, 
but does not clean up after running, causing other sessions to run in the 
previously set pool name.

 

For example

The second session does not manually set the pool name. The default pool name 
should be used, but the pool name of the previous user's settings is used. This 
is incorrect.

!image-2019-02-26-15-20-51-076.png!

 

!image-2019-02-26-15-21-02-966.png!

  was:
The user sets the value of spark.sql.thriftserver.scheduler.pool.
Spark thrift server saves this value in the LocalProperty of threadlocal type, 
but does not clean up after running, causing other sessions to run in the 
previously set pool name.


> Fix STS scheduler pool correct delivery
> ---
>
> Key: SPARK-26992
> URL: https://issues.apache.org/jira/browse/SPARK-26992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.4.0
>Reporter: dzcxzl
>Priority: Minor
>
> The user sets the value of spark.sql.thriftserver.scheduler.pool.
>  Spark thrift server saves this value in the LocalProperty of threadlocal 
> type, but does not clean up after running, causing other sessions to run in 
> the previously set pool name.
>  
> For example
> The second session does not manually set the pool name. The default pool name 
> should be used, but the pool name of the previous user's settings is used. 
> This is incorrect.
> !image-2019-02-26-15-20-51-076.png!
>  
> !image-2019-02-26-15-21-02-966.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26992) Fix STS scheduler pool correct delivery

2019-02-25 Thread dzcxzl (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-26992:
---
Description: 
The user sets the value of spark.sql.thriftserver.scheduler.pool.
 Spark thrift server saves this value in the LocalProperty of threadlocal type, 
but does not clean up after running, causing other sessions to run in the 
previously set pool name.

 

For example

The second session does not manually set the pool name. The default pool name 
should be used, but the pool name of the previous user's settings is used. This 
is incorrect.

 

  was:
The user sets the value of spark.sql.thriftserver.scheduler.pool.
 Spark thrift server saves this value in the LocalProperty of threadlocal type, 
but does not clean up after running, causing other sessions to run in the 
previously set pool name.

 

For example

The second session does not manually set the pool name. The default pool name 
should be used, but the pool name of the previous user's settings is used. This 
is incorrect.

!image-2019-02-26-15-20-51-076.png!

 

!image-2019-02-26-15-21-02-966.png!


> Fix STS scheduler pool correct delivery
> ---
>
> Key: SPARK-26992
> URL: https://issues.apache.org/jira/browse/SPARK-26992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.4.0
>Reporter: dzcxzl
>Priority: Minor
>
> The user sets the value of spark.sql.thriftserver.scheduler.pool.
>  Spark thrift server saves this value in the LocalProperty of threadlocal 
> type, but does not clean up after running, causing other sessions to run in 
> the previously set pool name.
>  
> For example
> The second session does not manually set the pool name. The default pool name 
> should be used, but the pool name of the previous user's settings is used. 
> This is incorrect.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26992) Fix STS scheduler pool correct delivery

2019-02-25 Thread dzcxzl (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-26992:
---
Attachment: error_session.png

> Fix STS scheduler pool correct delivery
> ---
>
> Key: SPARK-26992
> URL: https://issues.apache.org/jira/browse/SPARK-26992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.4.0
>Reporter: dzcxzl
>Priority: Minor
> Attachments: error_session.png, error_stage.png
>
>
> The user sets the value of spark.sql.thriftserver.scheduler.pool.
>  Spark thrift server saves this value in the LocalProperty of threadlocal 
> type, but does not clean up after running, causing other sessions to run in 
> the previously set pool name.
>  
> For example
> The second session does not manually set the pool name. The default pool name 
> should be used, but the pool name of the previous user's settings is used. 
> This is incorrect.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26992) Fix STS scheduler pool correct delivery

2019-02-25 Thread dzcxzl (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-26992:
---
Description: 
The user sets the value of spark.sql.thriftserver.scheduler.pool.
 Spark thrift server saves this value in the LocalProperty of threadlocal type, 
but does not clean up after running, causing other sessions to run in the 
previously set pool name.

 

For example

The second session does not manually set the pool name. The default pool name 
should be used, but the pool name of the previous user's settings is used. This 
is incorrect.

!error_session.png!

 

!error_stage.png!

 

  was:
The user sets the value of spark.sql.thriftserver.scheduler.pool.
 Spark thrift server saves this value in the LocalProperty of threadlocal type, 
but does not clean up after running, causing other sessions to run in the 
previously set pool name.

 

For example

The second session does not manually set the pool name. The default pool name 
should be used, but the pool name of the previous user's settings is used. This 
is incorrect.

 


> Fix STS scheduler pool correct delivery
> ---
>
> Key: SPARK-26992
> URL: https://issues.apache.org/jira/browse/SPARK-26992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.4.0
>Reporter: dzcxzl
>Priority: Minor
> Attachments: error_session.png, error_stage.png
>
>
> The user sets the value of spark.sql.thriftserver.scheduler.pool.
>  Spark thrift server saves this value in the LocalProperty of threadlocal 
> type, but does not clean up after running, causing other sessions to run in 
> the previously set pool name.
>  
> For example
> The second session does not manually set the pool name. The default pool name 
> should be used, but the pool name of the previous user's settings is used. 
> This is incorrect.
> !error_session.png!
>  
> !error_stage.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26992) Fix STS scheduler pool correct delivery

2019-02-25 Thread dzcxzl (JIRA)
dzcxzl created SPARK-26992:
--

 Summary: Fix STS scheduler pool correct delivery
 Key: SPARK-26992
 URL: https://issues.apache.org/jira/browse/SPARK-26992
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0, 2.0.0
Reporter: dzcxzl


The user sets the value of spark.sql.thriftserver.scheduler.pool.
Spark thrift server saves this value in the LocalProperty of threadlocal type, 
but does not clean up after running, causing other sessions to run in the 
previously set pool name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26992) Fix STS scheduler pool correct delivery

2019-02-25 Thread dzcxzl (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-26992:
---
Attachment: error_stage.png

> Fix STS scheduler pool correct delivery
> ---
>
> Key: SPARK-26992
> URL: https://issues.apache.org/jira/browse/SPARK-26992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.4.0
>Reporter: dzcxzl
>Priority: Minor
> Attachments: error_session.png, error_stage.png
>
>
> The user sets the value of spark.sql.thriftserver.scheduler.pool.
>  Spark thrift server saves this value in the LocalProperty of threadlocal 
> type, but does not clean up after running, causing other sessions to run in 
> the previously set pool name.
>  
> For example
> The second session does not manually set the pool name. The default pool name 
> should be used, but the pool name of the previous user's settings is used. 
> This is incorrect.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27073) Handling of IdleStateEvent causes the normal connection to close

2019-03-06 Thread dzcxzl (JIRA)
dzcxzl created SPARK-27073:
--

 Summary: Handling of IdleStateEvent causes the normal connection 
to close
 Key: SPARK-27073
 URL: https://issues.apache.org/jira/browse/SPARK-27073
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.0, 2.0.0
Reporter: dzcxzl


When TransportChannelHandler processes IdleStateEvent, it first calculates 
whether the last request time has timed out.
At this time, TransportClient.sendRpc initiates a request.
TransportChannelHandler gets responseHandler.numOutstandingRequests() > 0, 
causing the normal connection to be closed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27073) Fix a race condition when handling of IdleStateEvent

2019-03-10 Thread dzcxzl (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-27073:
---
Summary: Fix a race condition when handling of IdleStateEvent  (was: 
Handling of IdleStateEvent causes the normal connection to close)

> Fix a race condition when handling of IdleStateEvent
> 
>
> Key: SPARK-27073
> URL: https://issues.apache.org/jira/browse/SPARK-27073
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0, 2.4.0
>Reporter: dzcxzl
>Priority: Minor
>
> When TransportChannelHandler processes IdleStateEvent, it first calculates 
> whether the last request time has timed out.
> At this time, TransportClient.sendRpc initiates a request.
> TransportChannelHandler gets responseHandler.numOutstandingRequests() > 0, 
> causing the normal connection to be closed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27630) Stage retry causes totalRunningTasks calculation to be negative

2019-06-21 Thread dzcxzl (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-27630:
---
Description: 
Track tasks separately for each stage attempt (instead of tracking by stage), 
and do NOT reset the numRunningTasks to 0 on StageCompleted

In the case of stage retry, the {{taskEnd}} event from the zombie stage 
sometimes makes the number of {{totalRunningTasks}} negative, which will causes 
the job to get stuck.
 Similar problem also exists with {{stageIdToTaskIndices}} & 
{{stageIdToSpeculativeTaskIndices}}.
 If it is a failed {{taskEnd}} event of the zombie stage, this will cause 
{{stageIdToTaskIndices}} or {{stageIdToSpeculativeTaskIndices}} to remove the 
task index of the active stage, and the number of {{totalPendingTasks}} will 
increase unexpectedly.

  was:
In the case of stage retry, the {{taskEnd}} event from the zombie stage 
sometimes makes the number of {{totalRunningTasks}} negative, which will causes 
the job to get stuck.
Similar problem also exists with {{stageIdToTaskIndices}} & 
{{stageIdToSpeculativeTaskIndices}}.
If it is a failed {{taskEnd}} event of the zombie stage, this will cause 
{{stageIdToTaskIndices}} or {{stageIdToSpeculativeTaskIndices}} to remove the 
task index of the active stage, and the number of {{totalPendingTasks}} will 
increase unexpectedly.


> Stage retry causes totalRunningTasks calculation to be negative
> ---
>
> Key: SPARK-27630
> URL: https://issues.apache.org/jira/browse/SPARK-27630
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: dzcxzl
>Priority: Minor
>
> Track tasks separately for each stage attempt (instead of tracking by stage), 
> and do NOT reset the numRunningTasks to 0 on StageCompleted
> In the case of stage retry, the {{taskEnd}} event from the zombie stage 
> sometimes makes the number of {{totalRunningTasks}} negative, which will 
> causes the job to get stuck.
>  Similar problem also exists with {{stageIdToTaskIndices}} & 
> {{stageIdToSpeculativeTaskIndices}}.
>  If it is a failed {{taskEnd}} event of the zombie stage, this will cause 
> {{stageIdToTaskIndices}} or {{stageIdToSpeculativeTaskIndices}} to remove the 
> task index of the active stage, and the number of {{totalPendingTasks}} will 
> increase unexpectedly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27630) Stage retry causes totalRunningTasks calculation to be negative

2019-06-21 Thread dzcxzl (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-27630:
---
Description: 
Track tasks separately for each stage attempt (instead of tracking by stage), 
and do NOT reset the numRunningTasks to 0 on StageCompleted.

In the case of stage retry, the {{taskEnd}} event from the zombie stage 
sometimes makes the number of {{totalRunningTasks}} negative, which will causes 
the job to get stuck.
 Similar problem also exists with {{stageIdToTaskIndices}} & 
{{stageIdToSpeculativeTaskIndices}}.
 If it is a failed {{taskEnd}} event of the zombie stage, this will cause 
{{stageIdToTaskIndices}} or {{stageIdToSpeculativeTaskIndices}} to remove the 
task index of the active stage, and the number of {{totalPendingTasks}} will 
increase unexpectedly.

  was:
Track tasks separately for each stage attempt (instead of tracking by stage), 
and do NOT reset the numRunningTasks to 0 on StageCompleted

In the case of stage retry, the {{taskEnd}} event from the zombie stage 
sometimes makes the number of {{totalRunningTasks}} negative, which will causes 
the job to get stuck.
 Similar problem also exists with {{stageIdToTaskIndices}} & 
{{stageIdToSpeculativeTaskIndices}}.
 If it is a failed {{taskEnd}} event of the zombie stage, this will cause 
{{stageIdToTaskIndices}} or {{stageIdToSpeculativeTaskIndices}} to remove the 
task index of the active stage, and the number of {{totalPendingTasks}} will 
increase unexpectedly.


> Stage retry causes totalRunningTasks calculation to be negative
> ---
>
> Key: SPARK-27630
> URL: https://issues.apache.org/jira/browse/SPARK-27630
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: dzcxzl
>Priority: Minor
>
> Track tasks separately for each stage attempt (instead of tracking by stage), 
> and do NOT reset the numRunningTasks to 0 on StageCompleted.
> In the case of stage retry, the {{taskEnd}} event from the zombie stage 
> sometimes makes the number of {{totalRunningTasks}} negative, which will 
> causes the job to get stuck.
>  Similar problem also exists with {{stageIdToTaskIndices}} & 
> {{stageIdToSpeculativeTaskIndices}}.
>  If it is a failed {{taskEnd}} event of the zombie stage, this will cause 
> {{stageIdToTaskIndices}} or {{stageIdToSpeculativeTaskIndices}} to remove the 
> task index of the active stage, and the number of {{totalPendingTasks}} will 
> increase unexpectedly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28305) When the AM cannot obtain the container loss reason, ask GetExecutorLossReason times out

2019-07-08 Thread dzcxzl (JIRA)
dzcxzl created SPARK-28305:
--

 Summary: When the AM cannot obtain the container loss reason, ask 
GetExecutorLossReason times out
 Key: SPARK-28305
 URL: https://issues.apache.org/jira/browse/SPARK-28305
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 2.4.0
Reporter: dzcxzl


In some cases, such as NM machine crashes or shuts down,driver ask 
GetExecutorLossReason,
AM getCompletedContainersStatuses can't get the failure information of 
container.

Because the yarn NM detection timeout is 10 minutes, it is controlled by the 
parameter yarn.resourcemanager.rm.container-allocation.expiry-interval-ms.
So AM has to wait for 10 minutes to get the cause of the container failure.

Although the driver's ask fails, it will call recover.
However, due to the 2-minute timeout (spark.network.timeout) configured by 
IdleStateHandler, the connection between driver and am is closed, AM exits, app 
finish, driver exits, causing the job to fail.


AM LOG:

19/07/08 16:56:48 [dispatcher-event-loop-0] INFO YarnAllocator: add executor 
951 to pendingLossReasonRequests for get the loss reason
19/07/08 16:58:48 [dispatcher-event-loop-26] INFO ApplicationMaster$AMEndpoint: 
Driver terminated or disconnected! Shutting down.
19/07/08 16:58:48 [dispatcher-event-loop-26] INFO ApplicationMaster: Final app 
status: SUCCEEDED, exitCode: 0


Driver LOG:

19/07/08 16:58:48,476 [rpc-server-3-3] ERROR TransportChannelHandler: 
Connection to /xx.xx.xx.xx:19398 has been quiet for 12 ms while there are 
outstanding requests. Assuming connection is dead; please adjust 
spark.network.timeout if this is wrong.
19/07/08 16:58:48,476 [rpc-server-3-3] ERROR TransportResponseHandler: Still 
have 1 requests outstanding when connection from /xx.xx.xx.xx:19398 is closed
19/07/08 16:58:48,510 [rpc-server-3-3] WARN NettyRpcEnv: Ignored failure: 
java.io.IOException: Connection from /xx.xx.xx.xx:19398 closed
19/07/08 16:58:48,516 [netty-rpc-env-timeout] WARN 
YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to get executor loss 
reason for executor id 951 at RPC address xx.xx.xx.xx:49175, but got no 
response. Marking as slave lost.
org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply from null in 
120 seconds. This timeout is controlled by spark.rpc.askTimeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28314) Infinite recursion loop in MemoryAllocator#allocate when build HashedRelation

2019-07-09 Thread dzcxzl (JIRA)
dzcxzl created SPARK-28314:
--

 Summary: Infinite recursion loop in MemoryAllocator#allocate when 
build HashedRelation
 Key: SPARK-28314
 URL: https://issues.apache.org/jira/browse/SPARK-28314
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: dzcxzl


Broadcasting some tables with a large number of rows may cause infinite 
recursion loop in TaskMemoryManager#allocatePage.
This is because HashedRelation uses Long.MaxValue to construct MemoryManager 
instead of memory configured by driver.
MemoryAllocator#allocate throws OOM,TaskMemoryManager#allocatePage captures OOM 
and continues to call allocatePage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28564) Access history application defaults to the last attempt id

2019-07-30 Thread dzcxzl (JIRA)
dzcxzl created SPARK-28564:
--

 Summary: Access history application defaults to the last attempt id
 Key: SPARK-28564
 URL: https://issues.apache.org/jira/browse/SPARK-28564
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.4.0
Reporter: dzcxzl


When we set spark.history.ui.maxApplications to a small value, we can't get 
some apps from the page search.
If the url is spliced, it can be accessed if the app has no attempt.
Http://localhost:18080/history/local-xxx
But in the case of multiple attempted apps, such a url cannot be accessed, and 
the page displays Not Found.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28564) Access history application defaults to the last attempt id

2019-07-30 Thread dzcxzl (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-28564:
---
Description: 
When we set spark.history.ui.maxApplications to a small value, we can't get 
some apps from the page search.
If the url is spliced (http://localhost:18080/history/local-xxx), it can be 
accessed if the app has no attempt.
But in the case of multiple attempted apps, such a url cannot be accessed, and 
the page displays Not Found.

  was:
When we set spark.history.ui.maxApplications to a small value, we can't get 
some apps from the page search.
If the url is spliced, it can be accessed if the app has no attempt.
Http://localhost:18080/history/local-xxx
But in the case of multiple attempted apps, such a url cannot be accessed, and 
the page displays Not Found.


> Access history application defaults to the last attempt id
> --
>
> Key: SPARK-28564
> URL: https://issues.apache.org/jira/browse/SPARK-28564
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: dzcxzl
>Priority: Trivial
>
> When we set spark.history.ui.maxApplications to a small value, we can't get 
> some apps from the page search.
> If the url is spliced (http://localhost:18080/history/local-xxx), it can be 
> accessed if the app has no attempt.
> But in the case of multiple attempted apps, such a url cannot be accessed, 
> and the page displays Not Found.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46943) Support for configuring ShuffledHashJoin plan size Threshold

2024-02-01 Thread dzcxzl (Jira)
dzcxzl created SPARK-46943:
--

 Summary: Support for configuring ShuffledHashJoin plan size 
Threshold
 Key: SPARK-46943
 URL: https://issues.apache.org/jira/browse/SPARK-46943
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: dzcxzl






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46943) Support for configuring ShuffledHashJoin plan size Threshold

2024-02-01 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-46943:
---
Description: 
When we enable `spark.sql.join.preferSortMergeJoin=false`, we may get the 
following error.
 
{code:java}
org.apache.spark.SparkException: Can't acquire 1073741824 bytes memory to build 
hash relation, got 478549889 bytes
at 
org.apache.spark.sql.errors.QueryExecutionErrors$.cannotAcquireMemoryToBuildLongHashedRelationError(QueryExecutionErrors.scala:795)
at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.ensureAcquireMemory(HashedRelation.scala:581)
at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.grow(HashedRelation.scala:813)
at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.append(HashedRelation.scala:761)
at 
org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:1064)
at 
org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:153)
at 
org.apache.spark.sql.execution.joins.ShuffledHashJoinExec.buildHashedRelation(ShuffledHashJoinExec.scala:75)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage11.init(Unknown
 Source)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6(WholeStageCodegenExec.scala:775)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6$adapted(WholeStageCodegenExec.scala:771)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915) 
{code}
 
Because when converting SMJ to SHJ, it only determines whether the size of the 
plan is smaller than `conf.autoBroadcastJoinThreshold * 
conf.numShufflePartitions`. 
When the configured `numShufflePartitions` is large enough, it is easy to 
convert to SHJ. The executor build hash relation fails due to insufficient 
memory.
 
https://github.com/apache/spark/blob/223afea9960c7ef1a4c8654e043e860f6c248185/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L505-L513
 

> Support for configuring ShuffledHashJoin plan size Threshold
> 
>
> Key: SPARK-46943
> URL: https://issues.apache.org/jira/browse/SPARK-46943
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: dzcxzl
>Priority: Minor
>  Labels: pull-request-available
>
> When we enable `spark.sql.join.preferSortMergeJoin=false`, we may get the 
> following error.
>  
> {code:java}
> org.apache.spark.SparkException: Can't acquire 1073741824 bytes memory to 
> build hash relation, got 478549889 bytes
> at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.cannotAcquireMemoryToBuildLongHashedRelationError(QueryExecutionErrors.scala:795)
> at 
> org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.ensureAcquireMemory(HashedRelation.scala:581)
> at 
> org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.grow(HashedRelation.scala:813)
> at 
> org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.append(HashedRelation.scala:761)
> at 
> org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:1064)
> at 
> org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:153)
> at 
> org.apache.spark.sql.execution.joins.ShuffledHashJoinExec.buildHashedRelation(ShuffledHashJoinExec.scala:75)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage11.init(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6(WholeStageCodegenExec.scala:775)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6$adapted(WholeStageCodegenExec.scala:771)
> at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915) 
> {code}
>  
> Because when converting SMJ to SHJ, it only determines whether the size of 
> the plan is smaller than `conf.autoBroadcastJoinThreshold * 
> conf.numShufflePartitions`. 
> When the configured `numShufflePartitions` is large enough, it is easy to 
> convert to SHJ. The executor build hash relation fails due to insufficient 
> memory.
>  
> https://github.com/apache/spark/blob/223afea9960c7ef1a4c8654e043e860f6c248185/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L505-L513
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46943) Support for configuring ShuffledHashJoin plan size Threshold

2024-02-01 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-46943:
---
Description: 
When we enable `spark.sql.join.preferSortMergeJoin=false`, we may get the 
following error.
 
{code:java}
org.apache.spark.SparkException: Can't acquire 1073741824 bytes memory to build 
hash relation, got 478549889 bytes
    at 
org.apache.spark.sql.errors.QueryExecutionErrors$.cannotAcquireMemoryToBuildLongHashedRelationError(QueryExecutionErrors.scala:795)
    at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.ensureAcquireMemory(HashedRelation.scala:581)
    at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.grow(HashedRelation.scala:813)
    at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.append(HashedRelation.scala:761)
    at 
org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:1064)
    at 
org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:153)
    at 
org.apache.spark.sql.execution.joins.ShuffledHashJoinExec.buildHashedRelation(ShuffledHashJoinExec.scala:75)
    at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage11.init(Unknown
 Source)
    at 
org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6(WholeStageCodegenExec.scala:775)
    at 
org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6$adapted(WholeStageCodegenExec.scala:771)
    at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915){code}
 
Because when converting SMJ to SHJ, it only determines whether the size of the 
plan is smaller than `conf.autoBroadcastJoinThreshold * 
conf.numShufflePartitions`. 
When the configured `numShufflePartitions` is large enough, it is easy to 
convert to SHJ. The executor build hash relation fails due to insufficient 
memory.
 
[https://github.com/apache/spark/blob/223afea9960c7ef1a4c8654e043e860f6c248185/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L505-L513]
 

  was:
When we enable `spark.sql.join.preferSortMergeJoin=false`, we may get the 
following error.
 
{code:java}
org.apache.spark.SparkException: Can't acquire 1073741824 bytes memory to build 
hash relation, got 478549889 bytes
at 
org.apache.spark.sql.errors.QueryExecutionErrors$.cannotAcquireMemoryToBuildLongHashedRelationError(QueryExecutionErrors.scala:795)
at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.ensureAcquireMemory(HashedRelation.scala:581)
at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.grow(HashedRelation.scala:813)
at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.append(HashedRelation.scala:761)
at 
org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:1064)
at 
org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:153)
at 
org.apache.spark.sql.execution.joins.ShuffledHashJoinExec.buildHashedRelation(ShuffledHashJoinExec.scala:75)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage11.init(Unknown
 Source)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6(WholeStageCodegenExec.scala:775)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6$adapted(WholeStageCodegenExec.scala:771)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915) 
{code}
 
Because when converting SMJ to SHJ, it only determines whether the size of the 
plan is smaller than `conf.autoBroadcastJoinThreshold * 
conf.numShufflePartitions`. 
When the configured `numShufflePartitions` is large enough, it is easy to 
convert to SHJ. The executor build hash relation fails due to insufficient 
memory.
 
https://github.com/apache/spark/blob/223afea9960c7ef1a4c8654e043e860f6c248185/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L505-L513
 


> Support for configuring ShuffledHashJoin plan size Threshold
> 
>
> Key: SPARK-46943
> URL: https://issues.apache.org/jira/browse/SPARK-46943
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: dzcxzl
>Priority: Minor
>  Labels: pull-request-available
>
> When we enable `spark.sql.join.preferSortMergeJoin=false`, we may get the 
> following error.
>  
> {code:java}
> org.apache.spark.SparkException: Can't acquire 1073741824 bytes memory to 
> build hash relation, got 478549889 bytes
>     at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.cannotAcquireMemoryToBuildLongHashedRelationError(QueryExecutionErrors.scala:795)
>     at 
> org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.ensureAcquireMemory(HashedRelation.scala:581)
>     at 
> org.apache.spark.sql.

[jira] [Created] (SPARK-47456) Support ORC Brotli codec

2024-03-18 Thread dzcxzl (Jira)
dzcxzl created SPARK-47456:
--

 Summary: Support ORC Brotli codec
 Key: SPARK-47456
 URL: https://issues.apache.org/jira/browse/SPARK-47456
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: dzcxzl






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47799) Preserve parameter information when using SBT package jar

2024-04-10 Thread dzcxzl (Jira)
dzcxzl created SPARK-47799:
--

 Summary: Preserve parameter information when using SBT package jar
 Key: SPARK-47799
 URL: https://issues.apache.org/jira/browse/SPARK-47799
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.5.1
Reporter: dzcxzl






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48037) SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data

2024-04-28 Thread dzcxzl (Jira)
dzcxzl created SPARK-48037:
--

 Summary: SortShuffleWriter lacks shuffle write related metrics 
resulting in potentially inaccurate data
 Key: SPARK-48037
 URL: https://issues.apache.org/jira/browse/SPARK-48037
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SQL
Affects Versions: 3.1.0, 3.0.1
Reporter: dzcxzl






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48037) SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data

2024-04-29 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-48037:
---
Affects Version/s: 3.3.0
   (was: 3.1.0)
   (was: 3.0.1)

> SortShuffleWriter lacks shuffle write related metrics resulting in 
> potentially inaccurate data
> --
>
> Key: SPARK-48037
> URL: https://issues.apache.org/jira/browse/SPARK-48037
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.3.0
>Reporter: dzcxzl
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48070) Support AdaptiveQueryExecSuite to skip check results

2024-04-30 Thread dzcxzl (Jira)
dzcxzl created SPARK-48070:
--

 Summary: Support AdaptiveQueryExecSuite to skip check results
 Key: SPARK-48070
 URL: https://issues.apache.org/jira/browse/SPARK-48070
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 4.0.0
Reporter: dzcxzl






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44556) Reuse `OrcTail` when enable vectorizedReader

2023-07-26 Thread dzcxzl (Jira)
dzcxzl created SPARK-44556:
--

 Summary: Reuse `OrcTail` when enable vectorizedReader
 Key: SPARK-44556
 URL: https://issues.apache.org/jira/browse/SPARK-44556
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.1
Reporter: dzcxzl






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44583) `spark.*.io.connectionCreationTimeout` parameter documentation

2023-07-28 Thread dzcxzl (Jira)
dzcxzl created SPARK-44583:
--

 Summary: `spark.*.io.connectionCreationTimeout` parameter 
documentation
 Key: SPARK-44583
 URL: https://issues.apache.org/jira/browse/SPARK-44583
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.4.1
Reporter: dzcxzl






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44650) `spark.executor.defaultJavaOptions` Check illegal java options

2023-08-02 Thread dzcxzl (Jira)
dzcxzl created SPARK-44650:
--

 Summary: `spark.executor.defaultJavaOptions` Check illegal java 
options
 Key: SPARK-44650
 URL: https://issues.apache.org/jira/browse/SPARK-44650
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.1
Reporter: dzcxzl






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33458) Hive partition pruning support Contains, StartsWith and EndsWith predicate

2023-09-27 Thread dzcxzl (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769889#comment-17769889
 ] 

dzcxzl commented on SPARK-33458:


After [HIVE-22900|https://issues.apache.org/jira/browse/HIVE-22900] (HMS 4.0), 
like filter partition supports direct sql. Now Spark uses .* method, which may 
cause incorrect results.
Because .* is the way to write JDO query, direct sql must use %.

> Hive partition pruning support Contains, StartsWith and EndsWith predicate
> --
>
> Key: SPARK-33458
> URL: https://issues.apache.org/jira/browse/SPARK-33458
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.1.0
>
>
> Hive partition pruning can support Contains, StartsWith and EndsWith 
> predicate:
> https://github.com/apache/hive/blob/0c2c8a7f57330880f156466526bc0fdc94681035/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java#L1074-L1075
> https://github.com/apache/hive/commit/0c2c8a7f57330880f156466526bc0fdc94681035#diff-b1200d4259fafd48d7bbd0050e89772218813178f68461a2e82551c52319b282



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35744) Performance degradation in avro SpecificRecordBuilders

2023-01-05 Thread dzcxzl (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654965#comment-17654965
 ] 

dzcxzl commented on SPARK-35744:


This problem should be solved by upgrading avro 1.11.0 version 
([AVRO-3186|https://issues.apache.org/jira/browse/AVRO-3186]) through 
[SPARK-37206|https://issues.apache.org/jira/browse/SPARK-37206], we should be 
able to close this ticket.

> Performance degradation in avro SpecificRecordBuilders
> --
>
> Key: SPARK-35744
> URL: https://issues.apache.org/jira/browse/SPARK-35744
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Steven Aerts
>Priority: Minor
>
> Creating this bug to let you know that when we tested out spark 3.2.0 we saw 
> a significant performance degradation where our code was handling Avro 
> Specific Record objects.  This slowed down some of our jobs with a factor 4.
> Spark 3.2.0 upsteps the avro version from 1.8.2 to 1.10.2.
> The degradation was caused by a change introduced in avro 1.9.0.  This change 
> degrades performance when creating avro specific records in certain 
> classloader topologies, like the ones used in spark.
> We notified and [proposed|https://github.com/apache/avro/pull/1253] a simple 
> fix upstream in the avro project.  (Links contain more details)
> It is unclear for us how many other projects are using avro specific records 
> in a spark context and will be impacted by this degradation.
>  Feel free to close this issue if you think this issue is too much of a 
> corner case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42366) Log output shuffle data corruption diagnose causes

2023-02-06 Thread dzcxzl (Jira)
dzcxzl created SPARK-42366:
--

 Summary: Log output shuffle data corruption diagnose causes
 Key: SPARK-42366
 URL: https://issues.apache.org/jira/browse/SPARK-42366
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Affects Versions: 3.2.0
Reporter: dzcxzl






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42366) Log output shuffle data corruption diagnose cause

2023-02-06 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-42366:
---
Summary: Log output shuffle data corruption diagnose cause  (was: Log 
output shuffle data corruption diagnose causes)

> Log output shuffle data corruption diagnose cause
> -
>
> Key: SPARK-42366
> URL: https://issues.apache.org/jira/browse/SPARK-42366
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: dzcxzl
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42366) Log shuffle data corruption diagnose cause

2023-02-06 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-42366:
---
Summary: Log shuffle data corruption diagnose cause  (was: Log output 
shuffle data corruption diagnose cause)

> Log shuffle data corruption diagnose cause
> --
>
> Key: SPARK-42366
> URL: https://issues.apache.org/jira/browse/SPARK-42366
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: dzcxzl
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24677) MedianHeap is empty when speculation is enabled, causing the SparkContext to stop

2018-06-28 Thread dzcxzl (JIRA)
dzcxzl created SPARK-24677:
--

 Summary: MedianHeap is empty when speculation is enabled, causing 
the SparkContext to stop
 Key: SPARK-24677
 URL: https://issues.apache.org/jira/browse/SPARK-24677
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.1
Reporter: dzcxzl


When introducing SPARK-23433 , maybe cause stop sparkcontext.
{code:java}
ERROR Utils: uncaught error in thread task-scheduler-speculation, stopping 
SparkContext
java.util.NoSuchElementException: MedianHeap is empty.
at org.apache.spark.util.collection.MedianHeap.median(MedianHeap.scala:83)
at 
org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks(TaskSetManager.scala:968)
at 
org.apache.spark.scheduler.Pool$$anonfun$checkSpeculatableTasks$1.apply(Pool.scala:94)
at 
org.apache.spark.scheduler.Pool$$anonfun$checkSpeculatableTasks$1.apply(Pool.scala:93)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at org.apache.spark.scheduler.Pool.checkSpeculatableTasks(Pool.scala:93)
at 
org.apache.spark.scheduler.Pool$$anonfun$checkSpeculatableTasks$1.apply(Pool.scala:94)
at 
org.apache.spark.scheduler.Pool$$anonfun$checkSpeculatableTasks$1.apply(Pool.scala:93)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24677) Avoid NoSuchElementException from MedianHeap

2018-07-10 Thread dzcxzl (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-24677:
---
Summary: Avoid NoSuchElementException from MedianHeap  (was: MedianHeap is 
empty when speculation is enabled, causing the SparkContext to stop)

> Avoid NoSuchElementException from MedianHeap
> 
>
> Key: SPARK-24677
> URL: https://issues.apache.org/jira/browse/SPARK-24677
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: dzcxzl
>Priority: Critical
>
> When introducing SPARK-23433 , maybe cause stop sparkcontext.
> {code:java}
> ERROR Utils: uncaught error in thread task-scheduler-speculation, stopping 
> SparkContext
> java.util.NoSuchElementException: MedianHeap is empty.
> at org.apache.spark.util.collection.MedianHeap.median(MedianHeap.scala:83)
> at 
> org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks(TaskSetManager.scala:968)
> at 
> org.apache.spark.scheduler.Pool$$anonfun$checkSpeculatableTasks$1.apply(Pool.scala:94)
> at 
> org.apache.spark.scheduler.Pool$$anonfun$checkSpeculatableTasks$1.apply(Pool.scala:93)
> at scala.collection.Iterator$class.foreach(Iterator.scala:742)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at org.apache.spark.scheduler.Pool.checkSpeculatableTasks(Pool.scala:93)
> at 
> org.apache.spark.scheduler.Pool$$anonfun$checkSpeculatableTasks$1.apply(Pool.scala:94)
> at 
> org.apache.spark.scheduler.Pool$$anonfun$checkSpeculatableTasks$1.apply(Pool.scala:93)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24257) LongToUnsafeRowMap calculate the new size may be wrong

2018-05-12 Thread dzcxzl (JIRA)
dzcxzl created SPARK-24257:
--

 Summary: LongToUnsafeRowMap calculate the new size may be wrong
 Key: SPARK-24257
 URL: https://issues.apache.org/jira/browse/SPARK-24257
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0, 2.2.0, 2.1.0, 2.0.0
Reporter: dzcxzl


LongToUnsafeRowMap

Calculate the new size simply by multiplying by 2

At this time, the size of the application may not be enough to store data

Some data is lost and the data read out is dirty



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24317) Float-point numbers are displayed with different precision in ThriftServer2

2018-05-18 Thread dzcxzl (JIRA)
dzcxzl created SPARK-24317:
--

 Summary: Float-point numbers are displayed with different 
precision in ThriftServer2
 Key: SPARK-24317
 URL: https://issues.apache.org/jira/browse/SPARK-24317
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0, 2.2.0, 2.1.0, 2.0.0
Reporter: dzcxzl


When querying float-point numbers , the values displayed on beeline or jdbc are 
with different precision.
{code:java}
SELECT CAST(1.23 AS FLOAT)
Result:
1.230190734863
{code}
According to these two jira:

[HIVE-11802|https://issues.apache.org/jira/browse/HIVE-11802]
[HIVE-11832|https://issues.apache.org/jira/browse/HIVE-11832]

Make a slight modification to the spark hive thrift server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23230) Error by creating a data table when using hive.default.fileformat=orc

2018-01-26 Thread dzcxzl (JIRA)
dzcxzl created SPARK-23230:
--

 Summary: Error by creating a data table  when using 
hive.default.fileformat=orc
 Key: SPARK-23230
 URL: https://issues.apache.org/jira/browse/SPARK-23230
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.2.1, 2.2.0, 2.1.2, 2.1.1, 2.1.0, 2.0.2, 2.0.1, 2.0.0
Reporter: dzcxzl


When hive.default.fileformat is other kinds of file types, create textfile 
table cause a serda error.
 We should take the default type of textfile and sequencefile both as 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.
{code:java}
set hive.default.fileformat=orc;
create table tbl( i string ) stored as textfile;
desc formatted tbl;

Serde Library org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat  org.apache.hadoop.mapred.TextInputFormat
OutputFormat  org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23230) When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error

2018-02-11 Thread dzcxzl (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-23230:
---
Description: 
When hive.default.fileformat is other kinds of file types, create textfile 
table cause a serde error.
 We should take the default type of textfile and sequencefile both as 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.
{code:java}
set hive.default.fileformat=orc;
create table tbl( i string ) stored as textfile;
desc formatted tbl;

Serde Library org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat  org.apache.hadoop.mapred.TextInputFormat
OutputFormat  org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{code}

  was:
When hive.default.fileformat is other kinds of file types, create textfile 
table cause a serda error.
 We should take the default type of textfile and sequencefile both as 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.
{code:java}
set hive.default.fileformat=orc;
create table tbl( i string ) stored as textfile;
desc formatted tbl;

Serde Library org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat  org.apache.hadoop.mapred.TextInputFormat
OutputFormat  org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{code}

Summary: When hive.default.fileformat is other kinds of file types, 
create textfile table cause a serde error  (was: Error by creating a data table 
 when using hive.default.fileformat=orc)

> When hive.default.fileformat is other kinds of file types, create textfile 
> table cause a serde error
> 
>
> Key: SPARK-23230
> URL: https://issues.apache.org/jira/browse/SPARK-23230
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1
>Reporter: dzcxzl
>Priority: Minor
>
> When hive.default.fileformat is other kinds of file types, create textfile 
> table cause a serde error.
>  We should take the default type of textfile and sequencefile both as 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.
> {code:java}
> set hive.default.fileformat=orc;
> create table tbl( i string ) stored as textfile;
> desc formatted tbl;
> Serde Library org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat  org.apache.hadoop.mapred.TextInputFormat
> OutputFormat  org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23230) When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error

2018-02-11 Thread dzcxzl (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-23230:
---
Description: 
When hive.default.fileformat is other kinds of file types, create textfile 
table cause a serde error.
 We should take the default type of textfile and sequencefile both as 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.
{code:java}
set hive.default.fileformat=orc;
create table tbl( i string ) stored as textfile;
desc formatted tbl;

Serde Library org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat  org.apache.hadoop.mapred.TextInputFormat
OutputFormat  org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{code}
 
{code:java}
set hive.default.fileformat=orc;
create table tbl stored as textfile
as
select  1



{code}
{{It failed because it used the wrong SERDE}}
{code:java}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow cannot be cast to 
org.apache.hadoop.io.BytesWritable
at 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat$1.write(HiveIgnoreKeyTextOutputFormat.java:91)
at 
org.apache.spark.sql.hive.execution.HiveOutputWriter.write(HiveFileFormat.scala:149)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:327)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:258)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:256)
at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:261)
... 16 more
{code}
 

  was:
When hive.default.fileformat is other kinds of file types, create textfile 
table cause a serde error.
 We should take the default type of textfile and sequencefile both as 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.
{code:java}
set hive.default.fileformat=orc;
create table tbl( i string ) stored as textfile;
desc formatted tbl;

Serde Library org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat  org.apache.hadoop.mapred.TextInputFormat
OutputFormat  org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{code}
 
{code:java}
set hive.default.fileformat=orc;
create table tbl stored as textfile
as
select  1



{code}
{{It failed because it used the wrong SERDE}}{{}}
{code:java}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow cannot be cast to 
org.apache.hadoop.io.BytesWritable
at 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat$1.write(HiveIgnoreKeyTextOutputFormat.java:91)
at 
org.apache.spark.sql.hive.execution.HiveOutputWriter.write(HiveFileFormat.scala:149)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:327)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:258)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:256)
at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:261)
... 16 more
{code}
 


> When hive.default.fileformat is other kinds of file types, create textfile 
> table cause a serde error
> 
>
> Key: SPARK-23230
> URL: https://issues.apache.org/jira/browse/SPARK-23230
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1
>Reporter: dzcxzl
>Priority: Minor
>
> When hive.default.fileformat is other kinds of file types, create textfile 
> table cause a serde error.
>  We should take the default type of textfile and sequencefile both as 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.
> {code:java}
> set hive.default.fileformat=orc;
> create table tbl( i string ) stored as textfile;
> desc formatted tbl;
> Serde Library org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat  org.apache.hadoop.mapred.TextInputFormat
> OutputFormat  org.apache.

[jira] [Updated] (SPARK-23230) When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error

2018-02-11 Thread dzcxzl (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-23230:
---
Description: 
When hive.default.fileformat is other kinds of file types, create textfile 
table cause a serde error.
 We should take the default type of textfile and sequencefile both as 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.
{code:java}
set hive.default.fileformat=orc;
create table tbl( i string ) stored as textfile;
desc formatted tbl;

Serde Library org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat  org.apache.hadoop.mapred.TextInputFormat
OutputFormat  org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{code}
 
{code:java}
set hive.default.fileformat=orc;
create table tbl stored as textfile
as
select  1



{code}
{{It failed because it used the wrong SERDE}}{{}}
{code:java}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow cannot be cast to 
org.apache.hadoop.io.BytesWritable
at 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat$1.write(HiveIgnoreKeyTextOutputFormat.java:91)
at 
org.apache.spark.sql.hive.execution.HiveOutputWriter.write(HiveFileFormat.scala:149)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:327)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:258)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:256)
at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:261)
... 16 more
{code}
 

  was:
When hive.default.fileformat is other kinds of file types, create textfile 
table cause a serde error.
 We should take the default type of textfile and sequencefile both as 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.
{code:java}
set hive.default.fileformat=orc;
create table tbl( i string ) stored as textfile;
desc formatted tbl;

Serde Library org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat  org.apache.hadoop.mapred.TextInputFormat
OutputFormat  org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{code}


> When hive.default.fileformat is other kinds of file types, create textfile 
> table cause a serde error
> 
>
> Key: SPARK-23230
> URL: https://issues.apache.org/jira/browse/SPARK-23230
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1
>Reporter: dzcxzl
>Priority: Minor
>
> When hive.default.fileformat is other kinds of file types, create textfile 
> table cause a serde error.
>  We should take the default type of textfile and sequencefile both as 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.
> {code:java}
> set hive.default.fileformat=orc;
> create table tbl( i string ) stored as textfile;
> desc formatted tbl;
> Serde Library org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat  org.apache.hadoop.mapred.TextInputFormat
> OutputFormat  org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{code}
>  
> {code:java}
> set hive.default.fileformat=orc;
> create table tbl stored as textfile
> as
> select  1
> {code}
> {{It failed because it used the wrong SERDE}}{{}}
> {code:java}
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow cannot be cast to 
> org.apache.hadoop.io.BytesWritable
>   at 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat$1.write(HiveIgnoreKeyTextOutputFormat.java:91)
>   at 
> org.apache.spark.sql.hive.execution.HiveOutputWriter.write(HiveFileFormat.scala:149)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:327)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:258)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:256)
>   at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$dataso

[jira] [Created] (SPARK-23603) When the length of the json is in a range,get_json_object will result in missing tail data

2018-03-05 Thread dzcxzl (JIRA)
dzcxzl created SPARK-23603:
--

 Summary: When the length of the json is in a range,get_json_object 
will result in missing tail data
 Key: SPARK-23603
 URL: https://issues.apache.org/jira/browse/SPARK-23603
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0, 2.2.0, 2.0.0
Reporter: dzcxzl


Jackson(>=2.7.7) fixes the possibility of missing tail data when the length of 
the value is in a range

[https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.7.7]

[https://github.com/FasterXML/jackson-core/issues/307]

 

spark-shell:

 
{code:java}
val value = "x" * 3000
val json = s"""{"big": "$value"}"""
spark.sql("select length(get_json_object(\'"+json+"\','$.big'))" ).collect

res0: Array[org.apache.spark.sql.Row] = Array([2991])
{code}
correct result : 3000

 

 

There are two solutions
One is
bump jackson version to 2.7.7
The other one is
Replace writeRaw(char[] text, int offset, int len) with writeRaw(String text)

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23603) When the length of the json is in a range,get_json_object will result in missing tail data

2018-03-05 Thread dzcxzl (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-23603:
---
Labels: ca  (was: )

> When the length of the json is in a range,get_json_object will result in 
> missing tail data
> --
>
> Key: SPARK-23603
> URL: https://issues.apache.org/jira/browse/SPARK-23603
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.2.0, 2.3.0
>Reporter: dzcxzl
>Priority: Major
>  Labels: ca
>
> Jackson(>=2.7.7) fixes the possibility of missing tail data when the length 
> of the value is in a range
> [https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.7.7]
> [https://github.com/FasterXML/jackson-core/issues/307]
>  
> spark-shell:
>  
> {code:java}
> val value = "x" * 3000
> val json = s"""{"big": "$value"}"""
> spark.sql("select length(get_json_object(\'"+json+"\','$.big'))" ).collect
> res0: Array[org.apache.spark.sql.Row] = Array([2991])
> {code}
> correct result : 3000
>  
>  
> There are two solutions
> One is
> bump jackson version to 2.7.7
> The other one is
> Replace writeRaw(char[] text, int offset, int len) with writeRaw(String text)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23603) When the length of the json is in a range,get_json_object will result in missing tail data

2018-03-05 Thread dzcxzl (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-23603:
---
Labels:   (was: ca)

> When the length of the json is in a range,get_json_object will result in 
> missing tail data
> --
>
> Key: SPARK-23603
> URL: https://issues.apache.org/jira/browse/SPARK-23603
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.2.0, 2.3.0
>Reporter: dzcxzl
>Priority: Major
>
> Jackson(>=2.7.7) fixes the possibility of missing tail data when the length 
> of the value is in a range
> [https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.7.7]
> [https://github.com/FasterXML/jackson-core/issues/307]
>  
> spark-shell:
>  
> {code:java}
> val value = "x" * 3000
> val json = s"""{"big": "$value"}"""
> spark.sql("select length(get_json_object(\'"+json+"\','$.big'))" ).collect
> res0: Array[org.apache.spark.sql.Row] = Array([2991])
> {code}
> correct result : 3000
>  
>  
> There are two solutions
> One is
> bump jackson version to 2.7.7
> The other one is
> Replace writeRaw(char[] text, int offset, int len) with writeRaw(String text)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23603) When the length of the json is in a range,get_json_object will result in missing tail data

2018-03-12 Thread dzcxzl (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-23603:
---
Description: 
Jackson(>=2.7.7) fixes the possibility of missing tail data when the length of 
the value is in a range

[https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.7.7]

[https://github.com/FasterXML/jackson-core/issues/307]

 

spark-shell:
{code:java}
val value = "x" * 3000
val json = s"""{"big": "$value"}"""
spark.sql("select length(get_json_object(\'"+json+"\','$.big'))" ).collect

res0: Array[org.apache.spark.sql.Row] = Array([2991])
{code}
expect result : 3000 
actual result  : 2991

There are two solutions
 One is
 bump jackson version to 2.7.7
 The other one is
 Replace writeRaw(char[] text, int offset, int len) with writeRaw(String text)

 

  was:
Jackson(>=2.7.7) fixes the possibility of missing tail data when the length of 
the value is in a range

[https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.7.7]

[https://github.com/FasterXML/jackson-core/issues/307]

 

spark-shell:

 
{code:java}
val value = "x" * 3000
val json = s"""{"big": "$value"}"""
spark.sql("select length(get_json_object(\'"+json+"\','$.big'))" ).collect

res0: Array[org.apache.spark.sql.Row] = Array([2991])
{code}
correct result : 3000

 

 

There are two solutions
One is
bump jackson version to 2.7.7
The other one is
Replace writeRaw(char[] text, int offset, int len) with writeRaw(String text)

 


> When the length of the json is in a range,get_json_object will result in 
> missing tail data
> --
>
> Key: SPARK-23603
> URL: https://issues.apache.org/jira/browse/SPARK-23603
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.2.0, 2.3.0
>Reporter: dzcxzl
>Priority: Major
>
> Jackson(>=2.7.7) fixes the possibility of missing tail data when the length 
> of the value is in a range
> [https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.7.7]
> [https://github.com/FasterXML/jackson-core/issues/307]
>  
> spark-shell:
> {code:java}
> val value = "x" * 3000
> val json = s"""{"big": "$value"}"""
> spark.sql("select length(get_json_object(\'"+json+"\','$.big'))" ).collect
> res0: Array[org.apache.spark.sql.Row] = Array([2991])
> {code}
> expect result : 3000 
> actual result  : 2991
> There are two solutions
>  One is
>  bump jackson version to 2.7.7
>  The other one is
>  Replace writeRaw(char[] text, int offset, int len) with writeRaw(String text)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23603) When the length of the json is in a range,get_json_object will result in missing tail data

2018-03-12 Thread dzcxzl (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-23603:
---
Description: 
Jackson(>=2.7.7) fixes the possibility of missing tail data when the length of 
the value is in a range

[https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.7.7]

[https://github.com/FasterXML/jackson-core/issues/307]

spark-shell:
{code:java}
val value = "x" * 3000
val json = s"""{"big": "$value"}"""
spark.sql("select length(get_json_object(\'"+json+"\','$.big'))" ).collect

res0: Array[org.apache.spark.sql.Row] = Array([2991])
{code}
expect result : 3000 
 actual result  : 2991

There are two solutions
 One is
*Bump jackson from 2.6.7&2.6.7.1 to 2.7.7*
 The other one is
 *Replace writeRaw(char[] text, int offset, int len) with writeRaw(String text)*

 

  was:
Jackson(>=2.7.7) fixes the possibility of missing tail data when the length of 
the value is in a range

[https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.7.7]

[https://github.com/FasterXML/jackson-core/issues/307]

 

spark-shell:
{code:java}
val value = "x" * 3000
val json = s"""{"big": "$value"}"""
spark.sql("select length(get_json_object(\'"+json+"\','$.big'))" ).collect

res0: Array[org.apache.spark.sql.Row] = Array([2991])
{code}
expect result : 3000 
actual result  : 2991

There are two solutions
 One is
 bump jackson version to 2.7.7
 The other one is
 Replace writeRaw(char[] text, int offset, int len) with writeRaw(String text)

 


> When the length of the json is in a range,get_json_object will result in 
> missing tail data
> --
>
> Key: SPARK-23603
> URL: https://issues.apache.org/jira/browse/SPARK-23603
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.2.0, 2.3.0
>Reporter: dzcxzl
>Priority: Major
>
> Jackson(>=2.7.7) fixes the possibility of missing tail data when the length 
> of the value is in a range
> [https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.7.7]
> [https://github.com/FasterXML/jackson-core/issues/307]
> spark-shell:
> {code:java}
> val value = "x" * 3000
> val json = s"""{"big": "$value"}"""
> spark.sql("select length(get_json_object(\'"+json+"\','$.big'))" ).collect
> res0: Array[org.apache.spark.sql.Row] = Array([2991])
> {code}
> expect result : 3000 
>  actual result  : 2991
> There are two solutions
>  One is
> *Bump jackson from 2.6.7&2.6.7.1 to 2.7.7*
>  The other one is
>  *Replace writeRaw(char[] text, int offset, int len) with writeRaw(String 
> text)*
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32426) ui shows sql after variable substitution

2020-07-24 Thread dzcxzl (Jira)
dzcxzl created SPARK-32426:
--

 Summary: ui shows sql after variable substitution
 Key: SPARK-32426
 URL: https://issues.apache.org/jira/browse/SPARK-32426
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: dzcxzl


When submitting sql with variables, the sql displayed by ui is not replaced by 
variables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32508) Disallow empty part col values in partition spec before static partition writing

2020-07-31 Thread dzcxzl (Jira)
dzcxzl created SPARK-32508:
--

 Summary: Disallow empty part col values in partition spec before 
static partition writing
 Key: SPARK-32508
 URL: https://issues.apache.org/jira/browse/SPARK-32508
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: dzcxzl


When writing to the current static partition, the partition field is empty, and 
an error will be reported when all tasks are completed.
We can prevent such behavior before submitting the task.

 
{code:java}
org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: get partition: Value for key 
d is null or empty;
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:113)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.getPartitionOption(HiveExternalCatalog.scala:1212)
at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.getPartitionOption(ExternalCatalogWithListener.scala:240)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:276)
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24317) Float-point numbers are displayed with different precision in ThriftServer2

2018-10-11 Thread dzcxzl (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl resolved SPARK-24317.

Resolution: Duplicate

> Float-point numbers are displayed with different precision in ThriftServer2
> ---
>
> Key: SPARK-24317
> URL: https://issues.apache.org/jira/browse/SPARK-24317
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0
>Reporter: dzcxzl
>Priority: Minor
>
> When querying float-point numbers , the values displayed on beeline or jdbc 
> are with different precision.
> {code:java}
> SELECT CAST(1.23 AS FLOAT)
> Result:
> 1.230190734863
> {code}
> According to these two jira:
> [HIVE-11802|https://issues.apache.org/jira/browse/HIVE-11802]
> [HIVE-11832|https://issues.apache.org/jira/browse/HIVE-11832]
> Make a slight modification to the spark hive thrift server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29657) Iterator spill supporting radix sort with null prefix

2019-10-30 Thread dzcxzl (Jira)
dzcxzl created SPARK-29657:
--

 Summary: Iterator spill supporting radix sort with null prefix
 Key: SPARK-29657
 URL: https://issues.apache.org/jira/browse/SPARK-29657
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.4.0
Reporter: dzcxzl


In the case of radix sort, when the insertRecord part of the keyPrefix is null, 
the iterator type returned by getSortedIterator is ChainedIterator.
Currently ChainedIterator does not support spill, causing UnsafeExternalSorter 
to take up a lot of execution memory, allocatePage fails, throw 
SparkOutOfMemoryError Unable to acquire xxx bytes of memory, got 0

The following is a log of an error we encountered in the production environment.

[Executor task launch worker for task 66055] INFO TaskMemoryManager: Memory 
used in task 66055
[Executor task launch worker for task 66055] INFO TaskMemoryManager: Acquired 
by org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@39dd866e: 
64.0 KB
[Executor task launch worker for task 66055] INFO TaskMemoryManager: Acquired 
by org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@74d17927: 
4.6 GB
[Executor task launch worker for task 66055] INFO TaskMemoryManager: Acquired 
by org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@31478f9c: 
61.0 MB
[Executor task launch worker for task 66055] INFO TaskMemoryManager: 0 bytes of 
memory were used by task 66055 but are not associated with specific consumers
[Executor task launch worker for task 66055] INFO TaskMemoryManager: 4962998749 
bytes of memory are used for execution and 2218326 bytes of memory are used for 
storage
[Executor task launch worker for task 66055] ERROR Executor: Exception in task 
42.3 in stage 29.0 (TID 66055)
SparkOutOfMemoryError: Unable to acquire 3436 bytes of memory, got 0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29657) Iterator spill supporting radix sort with null prefix

2019-10-31 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-29657:
---
Issue Type: Bug  (was: Improvement)

> Iterator spill supporting radix sort with null prefix
> -
>
> Key: SPARK-29657
> URL: https://issues.apache.org/jira/browse/SPARK-29657
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: dzcxzl
>Priority: Trivial
>
> In the case of radix sort, when the insertRecord part of the keyPrefix is 
> null, the iterator type returned by getSortedIterator is ChainedIterator.
> Currently ChainedIterator does not support spill, causing 
> UnsafeExternalSorter to take up a lot of execution memory, allocatePage 
> fails, throw SparkOutOfMemoryError Unable to acquire xxx bytes of memory, got > 0
> The following is a log of an error we encountered in the production 
> environment.
> [Executor task launch worker for task 66055] INFO TaskMemoryManager: Memory 
> used in task 66055
> [Executor task launch worker for task 66055] INFO TaskMemoryManager: Acquired 
> by 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@39dd866e: 
> 64.0 KB
> [Executor task launch worker for task 66055] INFO TaskMemoryManager: Acquired 
> by 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@74d17927: 
> 4.6 GB
> [Executor task launch worker for task 66055] INFO TaskMemoryManager: Acquired 
> by 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@31478f9c: 
> 61.0 MB
> [Executor task launch worker for task 66055] INFO TaskMemoryManager: 0 bytes 
> of memory were used by task 66055 but are not associated with specific 
> consumers
> [Executor task launch worker for task 66055] INFO TaskMemoryManager: 
> 4962998749 bytes of memory are used for execution and 2218326 bytes of memory 
> are used for storage
> [Executor task launch worker for task 66055] ERROR Executor: Exception in 
> task 42.3 in stage 29.0 (TID 66055)
> SparkOutOfMemoryError: Unable to acquire 3436 bytes of memory, got 0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29943) Improve error messages for unsupported data type

2019-11-18 Thread dzcxzl (Jira)
dzcxzl created SPARK-29943:
--

 Summary: Improve error messages for unsupported data type
 Key: SPARK-29943
 URL: https://issues.apache.org/jira/browse/SPARK-29943
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: dzcxzl


When the spark reads the hive table and encounters an unsupported field type, 
the exception message has only one unsupported type, and the user cannot know 
which field of which table.

org.apache.spark.SparkException: Cannot recognize hive type string: void



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33147) Avoid distribute user jar from driver in yarn client mode

2020-10-14 Thread dzcxzl (Jira)
dzcxzl created SPARK-33147:
--

 Summary: Avoid distribute user jar from driver in yarn client mode
 Key: SPARK-33147
 URL: https://issues.apache.org/jira/browse/SPARK-33147
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.1
Reporter: dzcxzl


{code:java}
spark-submit --master yarn --deploy-mode client --num-executors 100 big_jar.jar
{code}
When the number of applied executors is large and the jar size is large, the 
executor pulls the jar from the driver, and the driver network traffic is high, 
and a timeout may occur. The driver and the executor of the yarn cluster may 
not be in the same data center.
{code:java}
20/10/04 00:46:02,269 [rpc-server-3-13] ERROR TransportRequestHandler: Error 
sending result StreamResponse{streamId=/jars/xxx-jar-with-dependencies.jar, 
byteCount=145417300, 
body=FileSegmentManagedBuffer{file=xxx-jar-with-dependencies.jar, offset=0, 
length=145417300}} to /x.x.x.x:33527; closing connection

{code}
 

We can automatically add the user's jar to --jars, which can avoid this problem.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33158) Check whether the executor and external service connection is available

2020-10-15 Thread dzcxzl (Jira)
dzcxzl created SPARK-33158:
--

 Summary: Check whether the executor and external service 
connection is available
 Key: SPARK-33158
 URL: https://issues.apache.org/jira/browse/SPARK-33158
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.1
Reporter: dzcxzl


At present, the executor only establishes a connection with the external 
shuffle service once at initialization and registers.

In yarn, nodemanager may stop working, shuffle service does not work, but the 
container/executor process is still executing, ShuffleMapTask can still be 
executed, and the returned result mapstatus is still the address of the 
external shuffle service
When the next stage reads shuffle data, it will not be connected to the shuffle 
serivce.
The final job execution failed.

The approach I thought of:
Before ShuffleMapTask starts to write data, check whether the connection is 
available, or regularly test whether the connection is normal, such as the 
driver and executor heartbeat check threads.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33158) Check whether the executor and external service connection is available

2020-10-16 Thread dzcxzl (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215244#comment-17215244
 ] 

dzcxzl commented on SPARK-33158:


[SPARK-13​​669|https://issues.apache.org/jira/browse/SPARK-13669]/[SPARK-20898|https://issues.apache.org/jira/browse/SPARK-20898]
 provides the ability to add host to the blacklist when fetch fails, 
[SPARK-27272|https://issues.apache.org/jira/browse/SPARK-27272] tries to enable 
this feature by default.

If we want to avoid this problem, we can configure
spark.blacklist.enabled=true
spark.blacklist.application.fetchFailure.enabled=true

Sometimes we will stop nm or decommission nm for a period of time,nm does not 
guarantee that all container processes will be killed when stopping, it may 
appear that the container is still executing, nm does not provide shuffle 
service,which will cause fetch fail.

> Check whether the executor and external service connection is available
> ---
>
> Key: SPARK-33158
> URL: https://issues.apache.org/jira/browse/SPARK-33158
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Trivial
>
> At present, the executor only establishes a connection with the external 
> shuffle service once at initialization and registers.
> In yarn, nodemanager may stop working, shuffle service does not work, but the 
> container/executor process is still executing, ShuffleMapTask can still be 
> executed, and the returned result mapstatus is still the address of the 
> external shuffle service
> When the next stage reads shuffle data, it will not be connected to the 
> shuffle serivce.
> The final job execution failed.
> The approach I thought of:
> Before ShuffleMapTask starts to write data, check whether the connection is 
> available, or regularly test whether the connection is normal, such as the 
> driver and executor heartbeat check threads.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33158) Check whether the executor and external service connection is available

2020-10-16 Thread dzcxzl (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215244#comment-17215244
 ] 

dzcxzl edited comment on SPARK-33158 at 10/16/20, 7:52 AM:
---

[SPARK-13​​669|https://issues.apache.org/jira/browse/SPARK-13669] / 
[SPARK-20898|https://issues.apache.org/jira/browse/SPARK-20898] provides the 
ability to add host to the blacklist when fetch fails, 
[SPARK-27272|https://issues.apache.org/jira/browse/SPARK-27272] tries to enable 
this feature by default.

If we want to avoid this problem, we can configure
spark.blacklist.enabled=true
spark.blacklist.application.fetchFailure.enabled=true

Sometimes we will stop nm or decommission nm for a period of time,nm does not 
guarantee that all container processes will be killed when stopping, it may 
appear that the container is still executing, nm does not provide shuffle 
service,which will cause fetch fail.


was (Author: dzcxzl):
[SPARK-13​​669|https://issues.apache.org/jira/browse/SPARK-13669]/[SPARK-20898|https://issues.apache.org/jira/browse/SPARK-20898]
 provides the ability to add host to the blacklist when fetch fails, 
[SPARK-27272|https://issues.apache.org/jira/browse/SPARK-27272] tries to enable 
this feature by default.

If we want to avoid this problem, we can configure
spark.blacklist.enabled=true
spark.blacklist.application.fetchFailure.enabled=true

Sometimes we will stop nm or decommission nm for a period of time,nm does not 
guarantee that all container processes will be killed when stopping, it may 
appear that the container is still executing, nm does not provide shuffle 
service,which will cause fetch fail.

> Check whether the executor and external service connection is available
> ---
>
> Key: SPARK-33158
> URL: https://issues.apache.org/jira/browse/SPARK-33158
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Trivial
>
> At present, the executor only establishes a connection with the external 
> shuffle service once at initialization and registers.
> In yarn, nodemanager may stop working, shuffle service does not work, but the 
> container/executor process is still executing, ShuffleMapTask can still be 
> executed, and the returned result mapstatus is still the address of the 
> external shuffle service
> When the next stage reads shuffle data, it will not be connected to the 
> shuffle serivce.
> The final job execution failed.
> The approach I thought of:
> Before ShuffleMapTask starts to write data, check whether the connection is 
> available, or regularly test whether the connection is normal, such as the 
> driver and executor heartbeat check threads.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33158) Check whether the executor and external service connection is available

2020-10-16 Thread dzcxzl (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215244#comment-17215244
 ] 

dzcxzl edited comment on SPARK-33158 at 10/16/20, 8:06 AM:
---

[SPARK-13​​669|https://issues.apache.org/jira/browse/SPARK-13669] / 
[SPARK-20898|https://issues.apache.org/jira/browse/SPARK-20898] provides the 
ability to add host to the blacklist when fetch fails, 
[SPARK-27272|https://issues.apache.org/jira/browse/SPARK-27272] tries to enable 
this feature by default.

If we want to avoid this problem, we can configure
spark.blacklist.enabled=true
spark.blacklist.application.fetchFailure.enabled=true

Sometimes we will stop nm or decommission nm for a period of time,nm does not 
guarantee that all container processes will be killed when stopping, it may 
appear that the container is still executing, nm does not provide shuffle 
service,which will cause fetch fail. 
https://issues.apache.org/jira/browse/YARN-72?focusedCommentId=13505398&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13505398

Although spark.files.fetchFailure.unRegisterOutputOnHost can be turned on to 
remove all shuffle files of the host, it may still be assigned to this host 
when the stage is rerun. Since the executor does not know whether the shuffle 
service is available, it continues to write data to disk.



was (Author: dzcxzl):
[SPARK-13​​669|https://issues.apache.org/jira/browse/SPARK-13669] / 
[SPARK-20898|https://issues.apache.org/jira/browse/SPARK-20898] provides the 
ability to add host to the blacklist when fetch fails, 
[SPARK-27272|https://issues.apache.org/jira/browse/SPARK-27272] tries to enable 
this feature by default.

If we want to avoid this problem, we can configure
spark.blacklist.enabled=true
spark.blacklist.application.fetchFailure.enabled=true

Sometimes we will stop nm or decommission nm for a period of time,nm does not 
guarantee that all container processes will be killed when stopping, it may 
appear that the container is still executing, nm does not provide shuffle 
service,which will cause fetch fail.

> Check whether the executor and external service connection is available
> ---
>
> Key: SPARK-33158
> URL: https://issues.apache.org/jira/browse/SPARK-33158
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Trivial
>
> At present, the executor only establishes a connection with the external 
> shuffle service once at initialization and registers.
> In yarn, nodemanager may stop working, shuffle service does not work, but the 
> container/executor process is still executing, ShuffleMapTask can still be 
> executed, and the returned result mapstatus is still the address of the 
> external shuffle service
> When the next stage reads shuffle data, it will not be connected to the 
> shuffle serivce.
> The final job execution failed.
> The approach I thought of:
> Before ShuffleMapTask starts to write data, check whether the connection is 
> available, or regularly test whether the connection is normal, such as the 
> driver and executor heartbeat check threads.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33158) Check whether the executor and external service connection is available

2020-10-16 Thread dzcxzl (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215244#comment-17215244
 ] 

dzcxzl edited comment on SPARK-33158 at 10/16/20, 8:08 AM:
---

[SPARK-13​​669|https://issues.apache.org/jira/browse/SPARK-13669] / 
[SPARK-20898|https://issues.apache.org/jira/browse/SPARK-20898] provides the 
ability to add host to the blacklist when fetch fails, 
[SPARK-27272|https://issues.apache.org/jira/browse/SPARK-27272] tries to enable 
this feature by default.

If we want to avoid this problem, we can configure
spark.blacklist.enabled=true
spark.blacklist.application.fetchFailure.enabled=true

Sometimes we will stop nm or decommission nm for a period of time,nm does not 
guarantee that all container processes will be killed when stopping, it may 
appear that the container is still executing, nm does not provide shuffle 
service,which will cause fetch fail. 
https://issues.apache.org/jira/browse/YARN-72?focusedCommentId=13505398&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13505398

Although spark.files.fetchFailure.unRegisterOutputOnHost can be turned on to 
remove all shuffle files of the host, it may still be assigned to this host 
when the stage is rerun. Since the executor does not know whether the shuffle 
service is available, it continues to write data to disk , the next round of 
shuffle read will fail again.



was (Author: dzcxzl):
[SPARK-13​​669|https://issues.apache.org/jira/browse/SPARK-13669] / 
[SPARK-20898|https://issues.apache.org/jira/browse/SPARK-20898] provides the 
ability to add host to the blacklist when fetch fails, 
[SPARK-27272|https://issues.apache.org/jira/browse/SPARK-27272] tries to enable 
this feature by default.

If we want to avoid this problem, we can configure
spark.blacklist.enabled=true
spark.blacklist.application.fetchFailure.enabled=true

Sometimes we will stop nm or decommission nm for a period of time,nm does not 
guarantee that all container processes will be killed when stopping, it may 
appear that the container is still executing, nm does not provide shuffle 
service,which will cause fetch fail. 
https://issues.apache.org/jira/browse/YARN-72?focusedCommentId=13505398&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13505398

Although spark.files.fetchFailure.unRegisterOutputOnHost can be turned on to 
remove all shuffle files of the host, it may still be assigned to this host 
when the stage is rerun. Since the executor does not know whether the shuffle 
service is available, it continues to write data to disk.


> Check whether the executor and external service connection is available
> ---
>
> Key: SPARK-33158
> URL: https://issues.apache.org/jira/browse/SPARK-33158
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Trivial
>
> At present, the executor only establishes a connection with the external 
> shuffle service once at initialization and registers.
> In yarn, nodemanager may stop working, shuffle service does not work, but the 
> container/executor process is still executing, ShuffleMapTask can still be 
> executed, and the returned result mapstatus is still the address of the 
> external shuffle service
> When the next stage reads shuffle data, it will not be connected to the 
> shuffle serivce.
> The final job execution failed.
> The approach I thought of:
> Before ShuffleMapTask starts to write data, check whether the connection is 
> available, or regularly test whether the connection is normal, such as the 
> driver and executor heartbeat check threads.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27630) Stage retry causes totalRunningTasks calculation to be negative

2019-05-03 Thread dzcxzl (JIRA)
dzcxzl created SPARK-27630:
--

 Summary: Stage retry causes totalRunningTasks calculation to be 
negative
 Key: SPARK-27630
 URL: https://issues.apache.org/jira/browse/SPARK-27630
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.0
Reporter: dzcxzl


In the case of stage retry, the onTaskEnd event may be sent after the new stage 
is submitted. This will cause the ExecutorAllocationManager to calculate that 
the currently running task is negative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27706) Add SQL metrics of numOutputRows for BroadcastExchangeExec

2019-05-14 Thread dzcxzl (JIRA)
dzcxzl created SPARK-27706:
--

 Summary: Add SQL metrics of numOutputRows for BroadcastExchangeExec
 Key: SPARK-27706
 URL: https://issues.apache.org/jira/browse/SPARK-27706
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: dzcxzl


Add SQL metrics of numOutputRows for BroadcastExchangeExec



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48218) TransportClientFactory.createClient may NPE cause FetchFailedException

2024-05-09 Thread dzcxzl (Jira)
dzcxzl created SPARK-48218:
--

 Summary: TransportClientFactory.createClient may NPE cause 
FetchFailedException
 Key: SPARK-48218
 URL: https://issues.apache.org/jira/browse/SPARK-48218
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Affects Versions: 4.0.0
Reporter: dzcxzl




{code:java}
org.apache.spark.shuffle.FetchFailedException
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:1180)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:913)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84)
at 
org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)

Caused by: java.lang.NullPointerException
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:178)
at 
org.apache.spark.network.shuffle.ExternalBlockStoreClient.lambda$fetchBlocks$0(ExternalBlockStoreClient.java:128)
at 
org.apache.spark.network.shuffle.RetryingBlockTransferor.transferAllOutstanding(RetryingBlockTransferor.java:154)
at 
org.apache.spark.network.shuffle.RetryingBlockTransferor.start(RetryingBlockTransferor.java:133)
at 
org.apache.spark.network.shuffle.ExternalBlockStoreClient.fetchBlocks(ExternalBlockStoreClient.java:139)
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48540) Avoid ivy output loading settings to stdout

2024-06-05 Thread dzcxzl (Jira)
dzcxzl created SPARK-48540:
--

 Summary: Avoid ivy output loading settings to stdout
 Key: SPARK-48540
 URL: https://issues.apache.org/jira/browse/SPARK-48540
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.0
Reporter: dzcxzl






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49039) Reset checkbox when executor metrics are loaded in the Stages tab

2024-07-29 Thread dzcxzl (Jira)
dzcxzl created SPARK-49039:
--

 Summary: Reset checkbox when executor metrics are loaded in the 
Stages tab
 Key: SPARK-49039
 URL: https://issues.apache.org/jira/browse/SPARK-49039
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.2.0, 3.1.0
Reporter: dzcxzl






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49217) Support separate buffer size configuration in UnsafeShuffleWriter

2024-08-12 Thread dzcxzl (Jira)
dzcxzl created SPARK-49217:
--

 Summary: Support separate buffer size configuration in 
UnsafeShuffleWriter
 Key: SPARK-49217
 URL: https://issues.apache.org/jira/browse/SPARK-49217
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: dzcxzl






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49217) Support separate buffer size configuration in UnsafeShuffleWriter

2024-08-23 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-49217:
---
Description: 
{{UnsafeShuffleWriter#mergeSpillsWithFileStream}} uses 
{{spark.shuffle.file.buffer}} as the buffer for reading spill files, and this 
buffer is an off-heap buffer.

In the spill process, we hope that the buffer size is larger, but once there 
are too many files in the spill, 
{{UnsafeShuffleWriter#mergeSpillsWithFileStream}} needs to create a lot of 
off-heap memory, which makes the executor easily killed by YARN.

 

[https://github.com/apache/spark/blob/e72d21c299a450e48b3cf6e5d36b8f3e9a568088/core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java#L372-L375]

 
{code:java}
       for (int i = 0; i < spills.length; i++) {
        spillInputStreams[i] = new NioBufferedFileInputStream(
          spills[i].file,
          inputBufferSizeInBytes);{code}

> Support separate buffer size configuration in UnsafeShuffleWriter
> -
>
> Key: SPARK-49217
> URL: https://issues.apache.org/jira/browse/SPARK-49217
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {{UnsafeShuffleWriter#mergeSpillsWithFileStream}} uses 
> {{spark.shuffle.file.buffer}} as the buffer for reading spill files, and this 
> buffer is an off-heap buffer.
> In the spill process, we hope that the buffer size is larger, but once there 
> are too many files in the spill, 
> {{UnsafeShuffleWriter#mergeSpillsWithFileStream}} needs to create a lot of 
> off-heap memory, which makes the executor easily killed by YARN.
>  
> [https://github.com/apache/spark/blob/e72d21c299a450e48b3cf6e5d36b8f3e9a568088/core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java#L372-L375]
>  
> {code:java}
>        for (int i = 0; i < spills.length; i++) {
>         spillInputStreams[i] = new NioBufferedFileInputStream(
>           spills[i].file,
>           inputBufferSizeInBytes);{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49386) Add memory based thresholds for shuffle spill

2024-08-25 Thread dzcxzl (Jira)
dzcxzl created SPARK-49386:
--

 Summary: Add memory based thresholds for shuffle spill
 Key: SPARK-49386
 URL: https://issues.apache.org/jira/browse/SPARK-49386
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Affects Versions: 4.0.0
Reporter: dzcxzl


We can only determine the number of spills by configuring 
{{{}spark.shuffle.spill.numElementsForceSpillThreshold{}}}. In some scenarios, 
the size of a row may be very large in the memory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49445) Support show tooltip in the progress bar of UI

2024-08-28 Thread dzcxzl (Jira)
dzcxzl created SPARK-49445:
--

 Summary: Support show tooltip in the progress bar of UI
 Key: SPARK-49445
 URL: https://issues.apache.org/jira/browse/SPARK-49445
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 4.0.0
Reporter: dzcxzl






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49502) Avoid NPE in SparkEnv.get.shuffleManager.unregisterShuffle

2024-09-03 Thread dzcxzl (Jira)
dzcxzl created SPARK-49502:
--

 Summary: Avoid NPE in SparkEnv.get.shuffleManager.unregisterShuffle
 Key: SPARK-49502
 URL: https://issues.apache.org/jira/browse/SPARK-49502
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: dzcxzl






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49509) Use Platform.allocateDirectBuffer instead of ByteBuffer.allocateDirect

2024-09-04 Thread dzcxzl (Jira)
dzcxzl created SPARK-49509:
--

 Summary: Use Platform.allocateDirectBuffer instead of 
ByteBuffer.allocateDirect
 Key: SPARK-49509
 URL: https://issues.apache.org/jira/browse/SPARK-49509
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: dzcxzl






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27630) Stage retry causes totalRunningTasks calculation to be negative

2019-05-30 Thread dzcxzl (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-27630:
---
Description: 
In the case of stage retry, the {{taskEnd}} event from the zombie stage 
sometimes makes the number of {{totalRunningTasks}} negative, which will causes 
the job to get stuck.
Similar problem also exists with {{stageIdToTaskIndices}} & 
{{stageIdToSpeculativeTaskIndices}}.
If it is a failed {{taskEnd}} event of the zombie stage, this will cause 
{{stageIdToTaskIndices}} or {{stageIdToSpeculativeTaskIndices}} to remove the 
task index of the active stage, and the number of {{totalPendingTasks}} will 
increase unexpectedly.

  was:In the case of stage retry, the onTaskEnd event may be sent after the new 
stage is submitted. This will cause the ExecutorAllocationManager to calculate 
that the currently running task is negative.


> Stage retry causes totalRunningTasks calculation to be negative
> ---
>
> Key: SPARK-27630
> URL: https://issues.apache.org/jira/browse/SPARK-27630
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: dzcxzl
>Priority: Minor
>
> In the case of stage retry, the {{taskEnd}} event from the zombie stage 
> sometimes makes the number of {{totalRunningTasks}} negative, which will 
> causes the job to get stuck.
> Similar problem also exists with {{stageIdToTaskIndices}} & 
> {{stageIdToSpeculativeTaskIndices}}.
> If it is a failed {{taskEnd}} event of the zombie stage, this will cause 
> {{stageIdToTaskIndices}} or {{stageIdToSpeculativeTaskIndices}} to remove the 
> task index of the active stage, and the number of {{totalPendingTasks}} will 
> increase unexpectedly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27706) Add SQL metrics of numOutputRows for BroadcastExchangeExec

2019-05-30 Thread dzcxzl (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl resolved SPARK-27706.

Resolution: Not A Problem

> Add SQL metrics of numOutputRows for BroadcastExchangeExec
> --
>
> Key: SPARK-27706
> URL: https://issues.apache.org/jira/browse/SPARK-27706
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: dzcxzl
>Priority: Trivial
>
> Add SQL metrics of numOutputRows for BroadcastExchangeExec



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28012) Hive UDF supports literal struct type

2019-06-11 Thread dzcxzl (JIRA)
dzcxzl created SPARK-28012:
--

 Summary: Hive UDF supports literal struct type
 Key: SPARK-28012
 URL: https://issues.apache.org/jira/browse/SPARK-28012
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: dzcxzl


Currently using hive udf, the parameter is literal struct type, will report an 
error.

No handler for Hive UDF 'xxxUDF': java.lang.RuntimeException: Hive doesn't 
support the constant type [StructType(StructField(name,StringType,true), 
StructField(value,DecimalType(3,1),true))]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28012) Hive UDF supports literal struct type

2019-06-17 Thread dzcxzl (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-28012:
---
Description: 
Currently using hive udf, the parameter is struct type, there will be an 
exception thrown.

No handler for Hive UDF 'xxxUDF': java.lang.RuntimeException: Hive doesn't 
support the constant type [StructType(StructField(name,StringType,true), 
StructField(value,DecimalType(3,1),true))]

  was:
Currently using hive udf, the parameter is literal struct type, will report an 
error.

No handler for Hive UDF 'xxxUDF': java.lang.RuntimeException: Hive doesn't 
support the constant type [StructType(StructField(name,StringType,true), 
StructField(value,DecimalType(3,1),true))]


> Hive UDF supports literal struct type
> -
>
> Key: SPARK-28012
> URL: https://issues.apache.org/jira/browse/SPARK-28012
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: dzcxzl
>Priority: Trivial
>
> Currently using hive udf, the parameter is struct type, there will be an 
> exception thrown.
> No handler for Hive UDF 'xxxUDF': java.lang.RuntimeException: Hive doesn't 
> support the constant type [StructType(StructField(name,StringType,true), 
> StructField(value,DecimalType(3,1),true))]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28012) Hive UDF supports struct type foldable expression

2019-06-17 Thread dzcxzl (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-28012:
---
Summary: Hive UDF supports struct type foldable expression  (was: Hive UDF 
supports literal struct type)

> Hive UDF supports struct type foldable expression
> -
>
> Key: SPARK-28012
> URL: https://issues.apache.org/jira/browse/SPARK-28012
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: dzcxzl
>Priority: Trivial
>
> Currently using hive udf, the parameter is struct type, there will be an 
> exception thrown.
> No handler for Hive UDF 'xxxUDF': java.lang.RuntimeException: Hive doesn't 
> support the constant type [StructType(StructField(name,StringType,true), 
> StructField(value,DecimalType(3,1),true))]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33753) Reduce the memory footprint and gc of the cache (hadoopJobMetadata)

2020-12-11 Thread dzcxzl (Jira)
dzcxzl created SPARK-33753:
--

 Summary: Reduce the memory footprint and gc of the cache 
(hadoopJobMetadata)
 Key: SPARK-33753
 URL: https://issues.apache.org/jira/browse/SPARK-33753
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.1
Reporter: dzcxzl


HadoopRDD uses soft-reference map to cache jobconf (rdd_id -> jobconf).
 When the number of hive partitions read by the driver is large, 
HadoopRDD.getPartitions will create many jobconfs and add them to the cache.
 The executor will also create a jobconf, add it to the cache, and share it 
among exeuctors.

The number of jobconfs in the driver cache increases the memory pressure. When 
the driver memory configuration is not high, full gc will be frequently used, 
and these jobconfs are hardly reused.

For example, spark.driver.memory=2560m, the read partition is about 14,000, and 
a jobconf 96kb.

The following is a repair comparison, full gc decreased from 62s to 0.8s, and 
the number of times decreased from 31 to 5. And the driver applied for less 
memory (Old Gen 1.667G->968M).

 

Current:

!image-2020-12-11-16-08-23-861.png!

jstat -gcutil PID 2s

!image-2020-12-11-16-08-53-656.png!

!image-2020-12-11-16-10-07-363.png!

 

Try to change softValues to weakValues

!image-2020-12-11-16-11-26-673.png!

!image-2020-12-11-16-11-35-988.png!

!image-2020-12-11-16-12-22-035.png!

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33753) Reduce the memory footprint and gc of the cache (hadoopJobMetadata)

2020-12-11 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-33753:
---
Description: 
HadoopRDD uses soft-reference map to cache jobconf (rdd_id -> jobconf).
 When the number of hive partitions read by the driver is large, 
HadoopRDD.getPartitions will create many jobconfs and add them to the cache.
 The executor will also create a jobconf, add it to the cache, and share it 
among exeuctors.

The number of jobconfs in the driver cache increases the memory pressure. When 
the driver memory configuration is not high, full gc will be frequently used, 
and these jobconfs are hardly reused.

For example, spark.driver.memory=2560m, the read partition is about 14,000, and 
a jobconf 96kb.

The following is a repair comparison, full gc decreased from 62s to 0.8s, and 
the number of times decreased from 31 to 5. And the driver applied for less 
memory (Old Gen 1.667G->968M).

 

Current:

!image-2020-12-11-16-17-28-991.png!

jstat -gcutil PID 2s

!image-2020-12-11-16-08-53-656.png!

!image-2020-12-11-16-10-07-363.png!

 

Try to change softValues to weakValues

!image-2020-12-11-16-11-26-673.png!

!image-2020-12-11-16-11-35-988.png!

!image-2020-12-11-16-12-22-035.png!

 

 

 

 

 

  was:
HadoopRDD uses soft-reference map to cache jobconf (rdd_id -> jobconf).
 When the number of hive partitions read by the driver is large, 
HadoopRDD.getPartitions will create many jobconfs and add them to the cache.
 The executor will also create a jobconf, add it to the cache, and share it 
among exeuctors.

The number of jobconfs in the driver cache increases the memory pressure. When 
the driver memory configuration is not high, full gc will be frequently used, 
and these jobconfs are hardly reused.

For example, spark.driver.memory=2560m, the read partition is about 14,000, and 
a jobconf 96kb.

The following is a repair comparison, full gc decreased from 62s to 0.8s, and 
the number of times decreased from 31 to 5. And the driver applied for less 
memory (Old Gen 1.667G->968M).

 

Current:

!image-2020-12-11-16-08-23-861.png!

jstat -gcutil PID 2s

!image-2020-12-11-16-08-53-656.png!

!image-2020-12-11-16-10-07-363.png!

 

Try to change softValues to weakValues

!image-2020-12-11-16-11-26-673.png!

!image-2020-12-11-16-11-35-988.png!

!image-2020-12-11-16-12-22-035.png!

 

 

 

 

 


> Reduce the memory footprint and gc of the cache (hadoopJobMetadata)
> ---
>
> Key: SPARK-33753
> URL: https://issues.apache.org/jira/browse/SPARK-33753
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Minor
>
> HadoopRDD uses soft-reference map to cache jobconf (rdd_id -> jobconf).
>  When the number of hive partitions read by the driver is large, 
> HadoopRDD.getPartitions will create many jobconfs and add them to the cache.
>  The executor will also create a jobconf, add it to the cache, and share it 
> among exeuctors.
> The number of jobconfs in the driver cache increases the memory pressure. 
> When the driver memory configuration is not high, full gc will be frequently 
> used, and these jobconfs are hardly reused.
> For example, spark.driver.memory=2560m, the read partition is about 14,000, 
> and a jobconf 96kb.
> The following is a repair comparison, full gc decreased from 62s to 0.8s, and 
> the number of times decreased from 31 to 5. And the driver applied for less 
> memory (Old Gen 1.667G->968M).
>  
> Current:
> !image-2020-12-11-16-17-28-991.png!
> jstat -gcutil PID 2s
> !image-2020-12-11-16-08-53-656.png!
> !image-2020-12-11-16-10-07-363.png!
>  
> Try to change softValues to weakValues
> !image-2020-12-11-16-11-26-673.png!
> !image-2020-12-11-16-11-35-988.png!
> !image-2020-12-11-16-12-22-035.png!
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33753) Reduce the memory footprint and gc of the cache (hadoopJobMetadata)

2020-12-11 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-33753:
---
Attachment: current_job_finish_time.png

> Reduce the memory footprint and gc of the cache (hadoopJobMetadata)
> ---
>
> Key: SPARK-33753
> URL: https://issues.apache.org/jira/browse/SPARK-33753
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Minor
> Attachments: current_job_finish_time.png
>
>
> HadoopRDD uses soft-reference map to cache jobconf (rdd_id -> jobconf).
>  When the number of hive partitions read by the driver is large, 
> HadoopRDD.getPartitions will create many jobconfs and add them to the cache.
>  The executor will also create a jobconf, add it to the cache, and share it 
> among exeuctors.
> The number of jobconfs in the driver cache increases the memory pressure. 
> When the driver memory configuration is not high, full gc will be frequently 
> used, and these jobconfs are hardly reused.
> For example, spark.driver.memory=2560m, the read partition is about 14,000, 
> and a jobconf 96kb.
> The following is a repair comparison, full gc decreased from 62s to 0.8s, and 
> the number of times decreased from 31 to 5. And the driver applied for less 
> memory (Old Gen 1.667G->968M).
>  
> Current:
> !image-2020-12-11-16-17-28-991.png!
> jstat -gcutil PID 2s
> !image-2020-12-11-16-08-53-656.png!
> !image-2020-12-11-16-10-07-363.png!
>  
> Try to change softValues to weakValues
> !image-2020-12-11-16-11-26-673.png!
> !image-2020-12-11-16-11-35-988.png!
> !image-2020-12-11-16-12-22-035.png!
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33753) Reduce the memory footprint and gc of the cache (hadoopJobMetadata)

2020-12-11 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-33753:
---
Attachment: current_visual_gc.png

> Reduce the memory footprint and gc of the cache (hadoopJobMetadata)
> ---
>
> Key: SPARK-33753
> URL: https://issues.apache.org/jira/browse/SPARK-33753
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Minor
> Attachments: current_gcutil.png, current_job_finish_time.png, 
> current_visual_gc.png
>
>
> HadoopRDD uses soft-reference map to cache jobconf (rdd_id -> jobconf).
>  When the number of hive partitions read by the driver is large, 
> HadoopRDD.getPartitions will create many jobconfs and add them to the cache.
>  The executor will also create a jobconf, add it to the cache, and share it 
> among exeuctors.
> The number of jobconfs in the driver cache increases the memory pressure. 
> When the driver memory configuration is not high, full gc will be frequently 
> used, and these jobconfs are hardly reused.
> For example, spark.driver.memory=2560m, the read partition is about 14,000, 
> and a jobconf 96kb.
> The following is a repair comparison, full gc decreased from 62s to 0.8s, and 
> the number of times decreased from 31 to 5. And the driver applied for less 
> memory (Old Gen 1.667G->968M).
>  
> Current:
> !image-2020-12-11-16-17-28-991.png!
> jstat -gcutil PID 2s
> !image-2020-12-11-16-08-53-656.png!
> !image-2020-12-11-16-10-07-363.png!
>  
> Try to change softValues to weakValues
> !image-2020-12-11-16-11-26-673.png!
> !image-2020-12-11-16-11-35-988.png!
> !image-2020-12-11-16-12-22-035.png!
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33753) Reduce the memory footprint and gc of the cache (hadoopJobMetadata)

2020-12-11 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-33753:
---
Attachment: current_gcutil.png

> Reduce the memory footprint and gc of the cache (hadoopJobMetadata)
> ---
>
> Key: SPARK-33753
> URL: https://issues.apache.org/jira/browse/SPARK-33753
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Minor
> Attachments: current_gcutil.png, current_job_finish_time.png, 
> current_visual_gc.png
>
>
> HadoopRDD uses soft-reference map to cache jobconf (rdd_id -> jobconf).
>  When the number of hive partitions read by the driver is large, 
> HadoopRDD.getPartitions will create many jobconfs and add them to the cache.
>  The executor will also create a jobconf, add it to the cache, and share it 
> among exeuctors.
> The number of jobconfs in the driver cache increases the memory pressure. 
> When the driver memory configuration is not high, full gc will be frequently 
> used, and these jobconfs are hardly reused.
> For example, spark.driver.memory=2560m, the read partition is about 14,000, 
> and a jobconf 96kb.
> The following is a repair comparison, full gc decreased from 62s to 0.8s, and 
> the number of times decreased from 31 to 5. And the driver applied for less 
> memory (Old Gen 1.667G->968M).
>  
> Current:
> !image-2020-12-11-16-17-28-991.png!
> jstat -gcutil PID 2s
> !image-2020-12-11-16-08-53-656.png!
> !image-2020-12-11-16-10-07-363.png!
>  
> Try to change softValues to weakValues
> !image-2020-12-11-16-11-26-673.png!
> !image-2020-12-11-16-11-35-988.png!
> !image-2020-12-11-16-12-22-035.png!
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33753) Reduce the memory footprint and gc of the cache (hadoopJobMetadata)

2020-12-11 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-33753:
---
Attachment: fix_visual_gc.png
fix_job_finish_time.png
fix_gcutil.png

> Reduce the memory footprint and gc of the cache (hadoopJobMetadata)
> ---
>
> Key: SPARK-33753
> URL: https://issues.apache.org/jira/browse/SPARK-33753
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Minor
> Attachments: current_gcutil.png, current_job_finish_time.png, 
> current_visual_gc.png, fix_gcutil.png, fix_job_finish_time.png, 
> fix_visual_gc.png
>
>
> HadoopRDD uses soft-reference map to cache jobconf (rdd_id -> jobconf).
>  When the number of hive partitions read by the driver is large, 
> HadoopRDD.getPartitions will create many jobconfs and add them to the cache.
>  The executor will also create a jobconf, add it to the cache, and share it 
> among exeuctors.
> The number of jobconfs in the driver cache increases the memory pressure. 
> When the driver memory configuration is not high, full gc will be frequently 
> used, and these jobconfs are hardly reused.
> For example, spark.driver.memory=2560m, the read partition is about 14,000, 
> and a jobconf 96kb.
> The following is a repair comparison, full gc decreased from 62s to 0.8s, and 
> the number of times decreased from 31 to 5. And the driver applied for less 
> memory (Old Gen 1.667G->968M).
>  
> Current:
> !image-2020-12-11-16-17-28-991.png!
> jstat -gcutil PID 2s
> !image-2020-12-11-16-08-53-656.png!
> !image-2020-12-11-16-10-07-363.png!
>  
> Try to change softValues to weakValues
> !image-2020-12-11-16-11-26-673.png!
> !image-2020-12-11-16-11-35-988.png!
> !image-2020-12-11-16-12-22-035.png!
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33753) Reduce the memory footprint and gc of the cache (hadoopJobMetadata)

2020-12-11 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-33753:
---
Description: 
 

HadoopRDD uses soft-reference map to cache jobconf (rdd_id -> jobconf).
 When the number of hive partitions read by the driver is large, 
HadoopRDD.getPartitions will create many jobconfs and add them to the cache.
 The executor will also create a jobconf, add it to the cache, and share it 
among exeuctors.

The number of jobconfs in the driver cache increases the memory pressure. When 
the driver memory configuration is not high, full gc will be frequently used, 
and these jobconfs are hardly reused.

For example, spark.driver.memory=2560m, the read partition is about 14,000, and 
a jobconf 96kb.

The following is a repair comparison, full gc decreased from 62s to 0.8s, and 
the number of times decreased from 31 to 5. And the driver applied for less 
memory (Old Gen 1.667G->968M), the job execution time is also reduced.

 

Current:

!current_job_finish_time.png!

jstat -gcutil PID 2s

!current_gcutil.png!

!current_visual_gc.png!

 

Try to change softValues to weakValues

!fix_job_finish_time.png!

!fix_gcutil.png!

!fix_visual_gc.png!

 

 

 

 

 

  was:
HadoopRDD uses soft-reference map to cache jobconf (rdd_id -> jobconf).
 When the number of hive partitions read by the driver is large, 
HadoopRDD.getPartitions will create many jobconfs and add them to the cache.
 The executor will also create a jobconf, add it to the cache, and share it 
among exeuctors.

The number of jobconfs in the driver cache increases the memory pressure. When 
the driver memory configuration is not high, full gc will be frequently used, 
and these jobconfs are hardly reused.

For example, spark.driver.memory=2560m, the read partition is about 14,000, and 
a jobconf 96kb.

The following is a repair comparison, full gc decreased from 62s to 0.8s, and 
the number of times decreased from 31 to 5. And the driver applied for less 
memory (Old Gen 1.667G->968M).

 

Current:

!image-2020-12-11-16-17-28-991.png!

jstat -gcutil PID 2s

!image-2020-12-11-16-08-53-656.png!

!image-2020-12-11-16-10-07-363.png!

 

Try to change softValues to weakValues

!image-2020-12-11-16-11-26-673.png!

!image-2020-12-11-16-11-35-988.png!

!image-2020-12-11-16-12-22-035.png!

 

 

 

 

 


> Reduce the memory footprint and gc of the cache (hadoopJobMetadata)
> ---
>
> Key: SPARK-33753
> URL: https://issues.apache.org/jira/browse/SPARK-33753
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Minor
> Attachments: current_gcutil.png, current_job_finish_time.png, 
> current_visual_gc.png, fix_gcutil.png, fix_job_finish_time.png, 
> fix_visual_gc.png
>
>
>  
> HadoopRDD uses soft-reference map to cache jobconf (rdd_id -> jobconf).
>  When the number of hive partitions read by the driver is large, 
> HadoopRDD.getPartitions will create many jobconfs and add them to the cache.
>  The executor will also create a jobconf, add it to the cache, and share it 
> among exeuctors.
> The number of jobconfs in the driver cache increases the memory pressure. 
> When the driver memory configuration is not high, full gc will be frequently 
> used, and these jobconfs are hardly reused.
> For example, spark.driver.memory=2560m, the read partition is about 14,000, 
> and a jobconf 96kb.
> The following is a repair comparison, full gc decreased from 62s to 0.8s, and 
> the number of times decreased from 31 to 5. And the driver applied for less 
> memory (Old Gen 1.667G->968M), the job execution time is also reduced.
>  
> Current:
> !current_job_finish_time.png!
> jstat -gcutil PID 2s
> !current_gcutil.png!
> !current_visual_gc.png!
>  
> Try to change softValues to weakValues
> !fix_job_finish_time.png!
> !fix_gcutil.png!
> !fix_visual_gc.png!
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33753) Reduce the memory footprint and gc of the cache (hadoopJobMetadata)

2020-12-11 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-33753:
---
Attachment: jobconf.png

> Reduce the memory footprint and gc of the cache (hadoopJobMetadata)
> ---
>
> Key: SPARK-33753
> URL: https://issues.apache.org/jira/browse/SPARK-33753
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Minor
> Attachments: current_gcutil.png, current_job_finish_time.png, 
> current_visual_gc.png, fix_gcutil.png, fix_job_finish_time.png, 
> fix_visual_gc.png, jobconf.png
>
>
>  
> HadoopRDD uses soft-reference map to cache jobconf (rdd_id -> jobconf).
>  When the number of hive partitions read by the driver is large, 
> HadoopRDD.getPartitions will create many jobconfs and add them to the cache.
>  The executor will also create a jobconf, add it to the cache, and share it 
> among exeuctors.
> The number of jobconfs in the driver cache increases the memory pressure. 
> When the driver memory configuration is not high, full gc will be frequently 
> used, and these jobconfs are hardly reused.
> For example, spark.driver.memory=2560m, the read partition is about 14,000, 
> and a jobconf 96kb.
> The following is a repair comparison, full gc decreased from 62s to 0.8s, and 
> the number of times decreased from 31 to 5. And the driver applied for less 
> memory (Old Gen 1.667G->968M), the job execution time is also reduced.
>  
> Current:
> !current_job_finish_time.png!
> jstat -gcutil PID 2s
> !current_gcutil.png!
> !current_visual_gc.png!
>  
> Try to change softValues to weakValues
> !fix_job_finish_time.png!
> !fix_gcutil.png!
> !fix_visual_gc.png!
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33753) Reduce the memory footprint and gc of the cache (hadoopJobMetadata)

2020-12-11 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-33753:
---
Description: 
 

HadoopRDD uses soft-reference map to cache jobconf (rdd_id -> jobconf).
 When the number of hive partitions read by the driver is large, 
HadoopRDD.getPartitions will create many jobconfs and add them to the cache.
 The executor will also create a jobconf, add it to the cache, and share it 
among exeuctors.

The number of jobconfs in the driver cache increases the memory pressure. When 
the driver memory configuration is not high, full gc will be frequently used, 
and these jobconfs are hardly reused.

For example, spark.driver.memory=2560m, the read partition is about 14,000, and 
a jobconf 96kb.

!jobconf.png!

 

The following is a repair comparison, full gc decreased from 62s to 0.8s, and 
the number of times decreased from 31 to 5. And the driver applied for less 
memory (Old Gen 1.667G->968M), the job execution time is also reduced.

 

Current:

!current_job_finish_time.png!

jstat -gcutil PID 2s

!current_gcutil.png!

!current_visual_gc.png!

 

Try to change softValues to weakValues

!fix_job_finish_time.png!

!fix_gcutil.png!

!fix_visual_gc.png!

 

 

 

 

 

  was:
 

HadoopRDD uses soft-reference map to cache jobconf (rdd_id -> jobconf).
 When the number of hive partitions read by the driver is large, 
HadoopRDD.getPartitions will create many jobconfs and add them to the cache.
 The executor will also create a jobconf, add it to the cache, and share it 
among exeuctors.

The number of jobconfs in the driver cache increases the memory pressure. When 
the driver memory configuration is not high, full gc will be frequently used, 
and these jobconfs are hardly reused.

For example, spark.driver.memory=2560m, the read partition is about 14,000, and 
a jobconf 96kb.

The following is a repair comparison, full gc decreased from 62s to 0.8s, and 
the number of times decreased from 31 to 5. And the driver applied for less 
memory (Old Gen 1.667G->968M), the job execution time is also reduced.

 

Current:

!current_job_finish_time.png!

jstat -gcutil PID 2s

!current_gcutil.png!

!current_visual_gc.png!

 

Try to change softValues to weakValues

!fix_job_finish_time.png!

!fix_gcutil.png!

!fix_visual_gc.png!

 

 

 

 

 


> Reduce the memory footprint and gc of the cache (hadoopJobMetadata)
> ---
>
> Key: SPARK-33753
> URL: https://issues.apache.org/jira/browse/SPARK-33753
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Minor
> Attachments: current_gcutil.png, current_job_finish_time.png, 
> current_visual_gc.png, fix_gcutil.png, fix_job_finish_time.png, 
> fix_visual_gc.png, jobconf.png
>
>
>  
> HadoopRDD uses soft-reference map to cache jobconf (rdd_id -> jobconf).
>  When the number of hive partitions read by the driver is large, 
> HadoopRDD.getPartitions will create many jobconfs and add them to the cache.
>  The executor will also create a jobconf, add it to the cache, and share it 
> among exeuctors.
> The number of jobconfs in the driver cache increases the memory pressure. 
> When the driver memory configuration is not high, full gc will be frequently 
> used, and these jobconfs are hardly reused.
> For example, spark.driver.memory=2560m, the read partition is about 14,000, 
> and a jobconf 96kb.
> !jobconf.png!
>  
> The following is a repair comparison, full gc decreased from 62s to 0.8s, and 
> the number of times decreased from 31 to 5. And the driver applied for less 
> memory (Old Gen 1.667G->968M), the job execution time is also reduced.
>  
> Current:
> !current_job_finish_time.png!
> jstat -gcutil PID 2s
> !current_gcutil.png!
> !current_visual_gc.png!
>  
> Try to change softValues to weakValues
> !fix_job_finish_time.png!
> !fix_gcutil.png!
> !fix_visual_gc.png!
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33753) Reduce the memory footprint and gc of the cache (hadoopJobMetadata)

2020-12-11 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-33753:
---
Description: 
 

HadoopRDD uses soft-reference map to cache jobconf (rdd_id -> jobconf).
 When the number of hive partitions read by the driver is large, 
HadoopRDD.getPartitions will create many jobconfs and add them to the cache.
 The executor will also create a jobconf, add it to the cache, and share it 
among exeuctors.

The number of jobconfs in the driver cache increases the memory pressure. When 
the driver memory configuration is not high, full gc becoming very frequent, 
and these jobconfs are hardly reused.

For example, spark.driver.memory=2560m, the read partition is about 14,000, and 
a jobconf 96kb.

!jobconf.png!

 

The following is a repair comparison, full gc decreased from 62s to 0.8s, and 
the number of times decreased from 31 to 5. And the driver applied for less 
memory (Old Gen 1.667G->968M), the job execution time is also reduced.

 

Current:

!current_job_finish_time.png!

jstat -gcutil PID 2s

!current_gcutil.png!

!current_visual_gc.png!

 

Try to change softValues to weakValues

!fix_job_finish_time.png!

!fix_gcutil.png!

!fix_visual_gc.png!

 

 

 

 

 

  was:
 

HadoopRDD uses soft-reference map to cache jobconf (rdd_id -> jobconf).
 When the number of hive partitions read by the driver is large, 
HadoopRDD.getPartitions will create many jobconfs and add them to the cache.
 The executor will also create a jobconf, add it to the cache, and share it 
among exeuctors.

The number of jobconfs in the driver cache increases the memory pressure. When 
the driver memory configuration is not high, full gc will be frequently used, 
and these jobconfs are hardly reused.

For example, spark.driver.memory=2560m, the read partition is about 14,000, and 
a jobconf 96kb.

!jobconf.png!

 

The following is a repair comparison, full gc decreased from 62s to 0.8s, and 
the number of times decreased from 31 to 5. And the driver applied for less 
memory (Old Gen 1.667G->968M), the job execution time is also reduced.

 

Current:

!current_job_finish_time.png!

jstat -gcutil PID 2s

!current_gcutil.png!

!current_visual_gc.png!

 

Try to change softValues to weakValues

!fix_job_finish_time.png!

!fix_gcutil.png!

!fix_visual_gc.png!

 

 

 

 

 


> Reduce the memory footprint and gc of the cache (hadoopJobMetadata)
> ---
>
> Key: SPARK-33753
> URL: https://issues.apache.org/jira/browse/SPARK-33753
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Minor
> Attachments: current_gcutil.png, current_job_finish_time.png, 
> current_visual_gc.png, fix_gcutil.png, fix_job_finish_time.png, 
> fix_visual_gc.png, jobconf.png
>
>
>  
> HadoopRDD uses soft-reference map to cache jobconf (rdd_id -> jobconf).
>  When the number of hive partitions read by the driver is large, 
> HadoopRDD.getPartitions will create many jobconfs and add them to the cache.
>  The executor will also create a jobconf, add it to the cache, and share it 
> among exeuctors.
> The number of jobconfs in the driver cache increases the memory pressure. 
> When the driver memory configuration is not high, full gc becoming very 
> frequent, and these jobconfs are hardly reused.
> For example, spark.driver.memory=2560m, the read partition is about 14,000, 
> and a jobconf 96kb.
> !jobconf.png!
>  
> The following is a repair comparison, full gc decreased from 62s to 0.8s, and 
> the number of times decreased from 31 to 5. And the driver applied for less 
> memory (Old Gen 1.667G->968M), the job execution time is also reduced.
>  
> Current:
> !current_job_finish_time.png!
> jstat -gcutil PID 2s
> !current_gcutil.png!
> !current_visual_gc.png!
>  
> Try to change softValues to weakValues
> !fix_job_finish_time.png!
> !fix_gcutil.png!
> !fix_visual_gc.png!
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33790) Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader

2020-12-15 Thread dzcxzl (Jira)
dzcxzl created SPARK-33790:
--

 Summary: Reduce the rpc call of getFileStatus in 
SingleFileEventLogFileReader
 Key: SPARK-33790
 URL: https://issues.apache.org/jira/browse/SPARK-33790
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.1
Reporter: dzcxzl


FsHistoryProvider#checkForLogs already has FileStatus when constructing 
SingleFileEventLogFileReader, and there is no need to get the FileStatus again 
when SingleFileEventLogFileReader#fileSizeForLastIndex.
This can reduce a lot of rpc calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33790) Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader

2020-12-15 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-33790:
---
Description: 
FsHistoryProvider#checkForLogs already has FileStatus when constructing 
SingleFileEventLogFileReader, and there is no need to get the FileStatus again 
when SingleFileEventLogFileReader#fileSizeForLastIndex.
This can reduce a lot of rpc calls and improve the speed of the history server.

  was:
FsHistoryProvider#checkForLogs already has FileStatus when constructing 
SingleFileEventLogFileReader, and there is no need to get the FileStatus again 
when SingleFileEventLogFileReader#fileSizeForLastIndex.
This can reduce a lot of rpc calls.


> Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader
> 
>
> Key: SPARK-33790
> URL: https://issues.apache.org/jira/browse/SPARK-33790
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Trivial
>
> FsHistoryProvider#checkForLogs already has FileStatus when constructing 
> SingleFileEventLogFileReader, and there is no need to get the FileStatus 
> again when SingleFileEventLogFileReader#fileSizeForLastIndex.
> This can reduce a lot of rpc calls and improve the speed of the history 
> server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33900) Show shuffle read size / records correctly when only remotebytesread is available

2020-12-24 Thread dzcxzl (Jira)
dzcxzl created SPARK-33900:
--

 Summary: Show shuffle read size / records correctly when only 
remotebytesread is available
 Key: SPARK-33900
 URL: https://issues.apache.org/jira/browse/SPARK-33900
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.0.1
Reporter: dzcxzl


At present, the stage page only displays the data of Shuffle Read Size / 
Records when localBytesRead>0.

Sometimes the data of shuffle read metrics is remoteBytesRead>0 
localBytesRead=0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33790) Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader

2021-01-14 Thread dzcxzl (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265691#comment-17265691
 ] 

dzcxzl commented on SPARK-33790:


This is indeed a performance regression problem.

The following is my case 2.x version EventLoggingListener.codecMap is of type 
mutable.HashMap, which is not thread-safe and may hang.

3.x version changed to EventLogFileReader.codecMap changed to ConcurrentHashMap 
type.

In the 2.x version, the history server may not work. 

I tried to use the 3.x version, and found that a round of scan has slowed down 
a lot, 7min rose to about 23min.

In addition, do I need to fix the thread safety issues in version 2.x?

[~kabhwan]

> Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader
> 
>
> Key: SPARK-33790
> URL: https://issues.apache.org/jira/browse/SPARK-33790
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Critical
> Fix For: 3.2.0
>
>
> FsHistoryProvider#checkForLogs already has FileStatus when constructing 
> SingleFileEventLogFileReader, and there is no need to get the FileStatus 
> again when SingleFileEventLogFileReader#fileSizeForLastIndex.
> This can reduce a lot of rpc calls and improve the speed of the history 
> server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33790) Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader

2021-01-14 Thread dzcxzl (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265724#comment-17265724
 ] 

dzcxzl commented on SPARK-33790:


Thread stack when not working
!http://git.dev.sh.ctripcorp.com/framework-di/spark-2.2.0/uploads/9cfa9662f563ac64f77f4d4ee6fd9243/image.png!

 

[https://github.com/scala/bug/issues/10436]

 

 

 

> Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader
> 
>
> Key: SPARK-33790
> URL: https://issues.apache.org/jira/browse/SPARK-33790
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Critical
> Fix For: 3.2.0
>
>
> FsHistoryProvider#checkForLogs already has FileStatus when constructing 
> SingleFileEventLogFileReader, and there is no need to get the FileStatus 
> again when SingleFileEventLogFileReader#fileSizeForLastIndex.
> This can reduce a lot of rpc calls and improve the speed of the history 
> server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34125) Make EventLoggingListener.codecMap thread-safe

2021-01-15 Thread dzcxzl (Jira)
dzcxzl created SPARK-34125:
--

 Summary: Make EventLoggingListener.codecMap thread-safe
 Key: SPARK-34125
 URL: https://issues.apache.org/jira/browse/SPARK-34125
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.7
Reporter: dzcxzl
 Attachments: jstack.png, top.png

2.x version of history server
EventLoggingListener.codecMap is of type mutable.HashMap, which is not thread 
safe
This will cause the history server to suddenly get stuck and not work.

The 3.x version was changed to EventLogFileReader.codecMap to ConcurrentHashMap 
type, so there is no such problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34125) Make EventLoggingListener.codecMap thread-safe

2021-01-15 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-34125:
---
Attachment: top.png

> Make EventLoggingListener.codecMap thread-safe
> --
>
> Key: SPARK-34125
> URL: https://issues.apache.org/jira/browse/SPARK-34125
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: dzcxzl
>Priority: Trivial
> Attachments: jstack.png, top.png
>
>
> 2.x version of history server
> EventLoggingListener.codecMap is of type mutable.HashMap, which is not thread 
> safe
> This will cause the history server to suddenly get stuck and not work.
> The 3.x version was changed to EventLogFileReader.codecMap to 
> ConcurrentHashMap type, so there is no such problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34125) Make EventLoggingListener.codecMap thread-safe

2021-01-15 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-34125:
---
Attachment: jstack.png

> Make EventLoggingListener.codecMap thread-safe
> --
>
> Key: SPARK-34125
> URL: https://issues.apache.org/jira/browse/SPARK-34125
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: dzcxzl
>Priority: Trivial
> Attachments: jstack.png, top.png
>
>
> 2.x version of history server
> EventLoggingListener.codecMap is of type mutable.HashMap, which is not thread 
> safe
> This will cause the history server to suddenly get stuck and not work.
> The 3.x version was changed to EventLogFileReader.codecMap to 
> ConcurrentHashMap type, so there is no such problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34125) Make EventLoggingListener.codecMap thread-safe

2021-01-15 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-34125:
---
Description: 
2.x version of history server
 EventLoggingListener.codecMap is of type mutable.HashMap, which is not thread 
safe
 This will cause the history server to suddenly get stuck and not work.

The 3.x version was changed to EventLogFileReader.codecMap to ConcurrentHashMap 
type, so there is no such problem.

PID 117049 0x1c939

!top.png!

 

!jstack.png!

 

 

 

  was:
2.x version of history server
EventLoggingListener.codecMap is of type mutable.HashMap, which is not thread 
safe
This will cause the history server to suddenly get stuck and not work.

The 3.x version was changed to EventLogFileReader.codecMap to ConcurrentHashMap 
type, so there is no such problem.


> Make EventLoggingListener.codecMap thread-safe
> --
>
> Key: SPARK-34125
> URL: https://issues.apache.org/jira/browse/SPARK-34125
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: dzcxzl
>Priority: Trivial
> Attachments: jstack.png, top.png
>
>
> 2.x version of history server
>  EventLoggingListener.codecMap is of type mutable.HashMap, which is not 
> thread safe
>  This will cause the history server to suddenly get stuck and not work.
> The 3.x version was changed to EventLogFileReader.codecMap to 
> ConcurrentHashMap type, so there is no such problem.
> PID 117049 0x1c939
> !top.png!
>  
> !jstack.png!
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34125) Make EventLoggingListener.codecMap thread-safe

2021-01-15 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-34125:
---
Description: 
2.x version of history server
 EventLoggingListener.codecMap is of type mutable.HashMap, which is not thread 
safe
 This will cause the history server to suddenly get stuck and not work.

The 3.x version was changed to EventLogFileReader.codecMap to ConcurrentHashMap 
type, so there is no such 
problem.([SPARK-28869|https://issues.apache.org/jira/browse/SPARK-28869])

PID 117049 0x1c939

!top.png!

 

!jstack.png!

 

 

 

  was:
2.x version of history server
 EventLoggingListener.codecMap is of type mutable.HashMap, which is not thread 
safe
 This will cause the history server to suddenly get stuck and not work.

The 3.x version was changed to EventLogFileReader.codecMap to ConcurrentHashMap 
type, so there is no such problem.

PID 117049 0x1c939

!top.png!

 

!jstack.png!

 

 

 


> Make EventLoggingListener.codecMap thread-safe
> --
>
> Key: SPARK-34125
> URL: https://issues.apache.org/jira/browse/SPARK-34125
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: dzcxzl
>Priority: Trivial
> Attachments: jstack.png, top.png
>
>
> 2.x version of history server
>  EventLoggingListener.codecMap is of type mutable.HashMap, which is not 
> thread safe
>  This will cause the history server to suddenly get stuck and not work.
> The 3.x version was changed to EventLogFileReader.codecMap to 
> ConcurrentHashMap type, so there is no such 
> problem.([SPARK-28869|https://issues.apache.org/jira/browse/SPARK-28869])
> PID 117049 0x1c939
> !top.png!
>  
> !jstack.png!
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34125) Make EventLoggingListener.codecMap thread-safe

2021-01-15 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-34125:
---
Description: 
2.x version of history server
 EventLoggingListener.codecMap is of type mutable.HashMap, which is not thread 
safe
 This will cause the history server to suddenly get stuck and not work.

The 3.x version was changed to EventLogFileReader.codecMap to ConcurrentHashMap 
type, so there is no such problem.(-SPARK-28869-)

PID 117049 0x1c939

!top.png!

 

!jstack.png!

 

 

 

  was:
2.x version of history server
 EventLoggingListener.codecMap is of type mutable.HashMap, which is not thread 
safe
 This will cause the history server to suddenly get stuck and not work.

The 3.x version was changed to EventLogFileReader.codecMap to ConcurrentHashMap 
type, so there is no such 
problem.([SPARK-28869|https://issues.apache.org/jira/browse/SPARK-28869])

PID 117049 0x1c939

!top.png!

 

!jstack.png!

 

 

 


> Make EventLoggingListener.codecMap thread-safe
> --
>
> Key: SPARK-34125
> URL: https://issues.apache.org/jira/browse/SPARK-34125
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: dzcxzl
>Priority: Trivial
> Attachments: jstack.png, top.png
>
>
> 2.x version of history server
>  EventLoggingListener.codecMap is of type mutable.HashMap, which is not 
> thread safe
>  This will cause the history server to suddenly get stuck and not work.
> The 3.x version was changed to EventLogFileReader.codecMap to 
> ConcurrentHashMap type, so there is no such problem.(-SPARK-28869-)
> PID 117049 0x1c939
> !top.png!
>  
> !jstack.png!
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33790) Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader

2021-01-15 Thread dzcxzl (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265812#comment-17265812
 ] 

dzcxzl commented on SPARK-33790:


ok, I opened a JIRA [SPARK-34125 
|https://issues.apache.org/jira/browse/SPARK-34125]
 

> Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader
> 
>
> Key: SPARK-33790
> URL: https://issues.apache.org/jira/browse/SPARK-33790
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Critical
> Fix For: 3.2.0
>
>
> FsHistoryProvider#checkForLogs already has FileStatus when constructing 
> SingleFileEventLogFileReader, and there is no need to get the FileStatus 
> again when SingleFileEventLogFileReader#fileSizeForLastIndex.
> This can reduce a lot of rpc calls and improve the speed of the history 
> server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33790) Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader

2021-01-15 Thread dzcxzl (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265724#comment-17265724
 ] 

dzcxzl edited comment on SPARK-33790 at 1/15/21, 4:25 PM:
--

Thread stack when not working.
PID 117049 0x1c939

!top.png!

 

!jstack.png!

 

[https://github.com/scala/bug/issues/10436]

 

 

 


was (Author: dzcxzl):
Thread stack when not working
!http://git.dev.sh.ctripcorp.com/framework-di/spark-2.2.0/uploads/9cfa9662f563ac64f77f4d4ee6fd9243/image.png!

 

[https://github.com/scala/bug/issues/10436]

 

 

 

> Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader
> 
>
> Key: SPARK-33790
> URL: https://issues.apache.org/jira/browse/SPARK-33790
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Critical
> Fix For: 3.0.2, 3.2.0, 3.1.1
>
>
> FsHistoryProvider#checkForLogs already has FileStatus when constructing 
> SingleFileEventLogFileReader, and there is no need to get the FileStatus 
> again when SingleFileEventLogFileReader#fileSizeForLastIndex.
> This can reduce a lot of rpc calls and improve the speed of the history 
> server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33790) Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader

2021-01-15 Thread dzcxzl (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265724#comment-17265724
 ] 

dzcxzl edited comment on SPARK-33790 at 1/15/21, 4:26 PM:
--

Thread stack when not working.
 PID 117049 0x1c939

!top.png!

!jstack.png!  

 

 

[https://github.com/scala/bug/issues/10436]

 

 

 


was (Author: dzcxzl):
Thread stack when not working.
PID 117049 0x1c939

!top.png!

 

!jstack.png!

 

[https://github.com/scala/bug/issues/10436]

 

 

 

> Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader
> 
>
> Key: SPARK-33790
> URL: https://issues.apache.org/jira/browse/SPARK-33790
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Critical
> Fix For: 3.0.2, 3.2.0, 3.1.1
>
>
> FsHistoryProvider#checkForLogs already has FileStatus when constructing 
> SingleFileEventLogFileReader, and there is no need to get the FileStatus 
> again when SingleFileEventLogFileReader#fileSizeForLastIndex.
> This can reduce a lot of rpc calls and improve the speed of the history 
> server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33790) Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader

2021-01-15 Thread dzcxzl (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265724#comment-17265724
 ] 

dzcxzl edited comment on SPARK-33790 at 1/15/21, 4:27 PM:
--

Thread stack when not working.
 PID 117049 0x1c939

[^top.png]

[^jstack.png]

 

 

[https://github.com/scala/bug/issues/10436]

 

 

 


was (Author: dzcxzl):
Thread stack when not working.
 PID 117049 0x1c939

!top.png!

!jstack.png!  

 

 

[https://github.com/scala/bug/issues/10436]

 

 

 

> Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader
> 
>
> Key: SPARK-33790
> URL: https://issues.apache.org/jira/browse/SPARK-33790
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Critical
> Fix For: 3.0.2, 3.2.0, 3.1.1
>
>
> FsHistoryProvider#checkForLogs already has FileStatus when constructing 
> SingleFileEventLogFileReader, and there is no need to get the FileStatus 
> again when SingleFileEventLogFileReader#fileSizeForLastIndex.
> This can reduce a lot of rpc calls and improve the speed of the history 
> server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33790) Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader

2021-01-15 Thread dzcxzl (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265724#comment-17265724
 ] 

dzcxzl edited comment on SPARK-33790 at 1/15/21, 4:28 PM:
--

[https://github.com/scala/bug/issues/10436]

 


was (Author: dzcxzl):
Thread stack when not working.
 PID 117049 0x1c939

[^top.png]

[^jstack.png]

 

 

[https://github.com/scala/bug/issues/10436]

 

 

 

> Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader
> 
>
> Key: SPARK-33790
> URL: https://issues.apache.org/jira/browse/SPARK-33790
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Critical
> Fix For: 3.0.2, 3.2.0, 3.1.1
>
>
> FsHistoryProvider#checkForLogs already has FileStatus when constructing 
> SingleFileEventLogFileReader, and there is no need to get the FileStatus 
> again when SingleFileEventLogFileReader#fileSizeForLastIndex.
> This can reduce a lot of rpc calls and improve the speed of the history 
> server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35437) Hive partition filtering client optimization

2021-05-18 Thread dzcxzl (Jira)
dzcxzl created SPARK-35437:
--

 Summary: Hive partition filtering client optimization
 Key: SPARK-35437
 URL: https://issues.apache.org/jira/browse/SPARK-35437
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.1
Reporter: dzcxzl


When we have a table with a lot of partitions and there is no way to filter it 
on the MetaStore Server, we will get all the partition details and filter it on 
the client side. This is slow and puts a lot of pressure on the MetaStore 
Server.
We can first pull all the partition names, filter by expressions, and then 
obtain detailed information about the corresponding partitions from the 
MetaStore Server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31590) The filter used by Metadata-only queries should not have Unevaluable

2020-04-27 Thread dzcxzl (Jira)
dzcxzl created SPARK-31590:
--

 Summary: The filter used by Metadata-only queries should not have 
Unevaluable
 Key: SPARK-31590
 URL: https://issues.apache.org/jira/browse/SPARK-31590
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: dzcxzl


code:
{code:scala}
sql("set spark.sql.optimizer.metadataOnly=true")
sql("CREATE TABLE test_tbl (a INT,d STRING,h STRING) USING PARQUET 
PARTITIONED BY (d ,h)")
sql("""
|INSERT OVERWRITE TABLE test_tbl PARTITION(d,h)
|SELECT 1,'2020-01-01','23'
|UNION ALL
|SELECT 2,'2020-01-02','01'
|UNION ALL
|SELECT 3,'2020-01-02','02'
""".stripMargin)
sql(
  s"""
 |SELECT d, MAX(h) AS h
 |FROM test_tbl
 |WHERE d= (
 |  SELECT MAX(d) AS d
 |  FROM test_tbl
 |)
 |GROUP BY d
""".stripMargin).collect()
{code}

Exception:
{code:java}
java.lang.UnsupportedOperationException: Cannot evaluate expression: 
scalar-subquery#48 []

...
at 
org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.prunePartitions(PartitioningAwareFileIndex.scala:180)
{code}

optimizedPlan:
{code:java}
Aggregate [d#245], [d#245, max(h#246) AS h#243]
+- Project [d#245, h#246]
   +- Filter (isnotnull(d#245) AND (d#245 = scalar-subquery#242 []))
  :  +- Aggregate [max(d#245) AS d#241]
  : +- LocalRelation , [d#245]
  +- Relation[a#244,d#245,h#246] parquet
{code}






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31590) The filter used by Metadata-only queries should not have Unevaluable

2020-04-27 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-31590:
---
Description: 
When using SPARK-23877, some sql execution errors.

code:
{code:scala}
sql("set spark.sql.optimizer.metadataOnly=true")
sql("CREATE TABLE test_tbl (a INT,d STRING,h STRING) USING PARQUET 
PARTITIONED BY (d ,h)")
sql("""
|INSERT OVERWRITE TABLE test_tbl PARTITION(d,h)
|SELECT 1,'2020-01-01','23'
|UNION ALL
|SELECT 2,'2020-01-02','01'
|UNION ALL
|SELECT 3,'2020-01-02','02'
""".stripMargin)
sql(
  s"""
 |SELECT d, MAX(h) AS h
 |FROM test_tbl
 |WHERE d= (
 |  SELECT MAX(d) AS d
 |  FROM test_tbl
 |)
 |GROUP BY d
""".stripMargin).collect()
{code}
Exception:
{code:java}
java.lang.UnsupportedOperationException: Cannot evaluate expression: 
scalar-subquery#48 []

...
at 
org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.prunePartitions(PartitioningAwareFileIndex.scala:180)
{code}
optimizedPlan:
{code:java}
Aggregate [d#245], [d#245, max(h#246) AS h#243]
+- Project [d#245, h#246]
   +- Filter (isnotnull(d#245) AND (d#245 = scalar-subquery#242 []))
  :  +- Aggregate [max(d#245) AS d#241]
  : +- LocalRelation , [d#245]
  +- Relation[a#244,d#245,h#246] parquet
{code}

  was:
code:
{code:scala}
sql("set spark.sql.optimizer.metadataOnly=true")
sql("CREATE TABLE test_tbl (a INT,d STRING,h STRING) USING PARQUET 
PARTITIONED BY (d ,h)")
sql("""
|INSERT OVERWRITE TABLE test_tbl PARTITION(d,h)
|SELECT 1,'2020-01-01','23'
|UNION ALL
|SELECT 2,'2020-01-02','01'
|UNION ALL
|SELECT 3,'2020-01-02','02'
""".stripMargin)
sql(
  s"""
 |SELECT d, MAX(h) AS h
 |FROM test_tbl
 |WHERE d= (
 |  SELECT MAX(d) AS d
 |  FROM test_tbl
 |)
 |GROUP BY d
""".stripMargin).collect()
{code}

Exception:
{code:java}
java.lang.UnsupportedOperationException: Cannot evaluate expression: 
scalar-subquery#48 []

...
at 
org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.prunePartitions(PartitioningAwareFileIndex.scala:180)
{code}

optimizedPlan:
{code:java}
Aggregate [d#245], [d#245, max(h#246) AS h#243]
+- Project [d#245, h#246]
   +- Filter (isnotnull(d#245) AND (d#245 = scalar-subquery#242 []))
  :  +- Aggregate [max(d#245) AS d#241]
  : +- LocalRelation , [d#245]
  +- Relation[a#244,d#245,h#246] parquet
{code}





> The filter used by Metadata-only queries should not have Unevaluable
> 
>
> Key: SPARK-31590
> URL: https://issues.apache.org/jira/browse/SPARK-31590
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: dzcxzl
>Priority: Trivial
>
> When using SPARK-23877, some sql execution errors.
> code:
> {code:scala}
> sql("set spark.sql.optimizer.metadataOnly=true")
> sql("CREATE TABLE test_tbl (a INT,d STRING,h STRING) USING PARQUET 
> PARTITIONED BY (d ,h)")
> sql("""
> |INSERT OVERWRITE TABLE test_tbl PARTITION(d,h)
> |SELECT 1,'2020-01-01','23'
> |UNION ALL
> |SELECT 2,'2020-01-02','01'
> |UNION ALL
> |SELECT 3,'2020-01-02','02'
> """.stripMargin)
> sql(
>   s"""
>  |SELECT d, MAX(h) AS h
>  |FROM test_tbl
>  |WHERE d= (
>  |  SELECT MAX(d) AS d
>  |  FROM test_tbl
>  |)
>  |GROUP BY d
> """.stripMargin).collect()
> {code}
> Exception:
> {code:java}
> java.lang.UnsupportedOperationException: Cannot evaluate expression: 
> scalar-subquery#48 []
> ...
> at 
> org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.prunePartitions(PartitioningAwareFileIndex.scala:180)
> {code}
> optimizedPlan:
> {code:java}
> Aggregate [d#245], [d#245, max(h#246) AS h#243]
> +- Project [d#245, h#246]
>+- Filter (isnotnull(d#245) AND (d#245 = scalar-subquery#242 []))
>   :  +- Aggregate [max(d#245) AS d#241]
>   : +- LocalRelation , [d#245]
>   +- Relation[a#244,d#245,h#246] parquet
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >