[jira] [Created] (SPARK-30364) The spark-streaming-kafka-0-10_2.11 test cases are failing on ppc64le

2019-12-27 Thread AK97 (Jira)
AK97 created SPARK-30364:


 Summary: The spark-streaming-kafka-0-10_2.11 test cases are 
failing on ppc64le
 Key: SPARK-30364
 URL: https://issues.apache.org/jira/browse/SPARK-30364
 Project: Spark
  Issue Type: Test
  Components: Build
Affects Versions: 2.4.0
 Environment: 
os: rhel 7.6
arch: ppc64le
Reporter: AK97


I have been trying to build the Apache Spark on rhel_7.6/ppc64le. The 
spark-streaming-kafka-0-10_2.11 test cases are failing with following error :

[ERROR] 
/opt/spark/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala:85:
 Symbol 'term org.eclipse' is missing from the classpath.
This symbol is required by 'method 
org.apache.spark.metrics.MetricsSystem.getServletHandlers'.
Make sure that term eclipse is in your classpath and check for conflicting 
dependencies with `-Ylog-classpath`.
A full rebuild may help if 'MetricsSystem.class' was compiled against an 
incompatible version of org.
[ERROR] testUtils.sendMessages(topic, data.toArray) 
 ^


Would like some help on understanding the cause for the same . I am running it 
on a High end VM with good connectivity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30364) The spark-streaming-kafka-0-10_2.11 test cases are failing on ppc64le

2019-12-27 Thread AK97 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AK97 updated SPARK-30364:
-
Description: 
I have been trying to build the Apache Spark on rhel_7.6/ppc64le; however, the 
spark-streaming-kafka-0-10_2.11 test cases are failing with following error :

[ERROR] 
/opt/spark/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala:85:
 Symbol 'term org.eclipse' is missing from the classpath.
This symbol is required by 'method 
org.apache.spark.metrics.MetricsSystem.getServletHandlers'.
Make sure that term eclipse is in your classpath and check for conflicting 
dependencies with `-Ylog-classpath`.
A full rebuild may help if 'MetricsSystem.class' was compiled against an 
incompatible version of org.
[ERROR] testUtils.sendMessages(topic, data.toArray) 
 ^


Would like some help on understanding the cause for the same . I am running it 
on a High end VM with good connectivity.

  was:
I have been trying to build the Apache Spark on rhel_7.6/ppc64le. The 
spark-streaming-kafka-0-10_2.11 test cases are failing with following error :

[ERROR] 
/opt/spark/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala:85:
 Symbol 'term org.eclipse' is missing from the classpath.
This symbol is required by 'method 
org.apache.spark.metrics.MetricsSystem.getServletHandlers'.
Make sure that term eclipse is in your classpath and check for conflicting 
dependencies with `-Ylog-classpath`.
A full rebuild may help if 'MetricsSystem.class' was compiled against an 
incompatible version of org.
[ERROR] testUtils.sendMessages(topic, data.toArray) 
 ^


Would like some help on understanding the cause for the same . I am running it 
on a High end VM with good connectivity.


> The spark-streaming-kafka-0-10_2.11 test cases are failing on ppc64le
> -
>
> Key: SPARK-30364
> URL: https://issues.apache.org/jira/browse/SPARK-30364
> Project: Spark
>  Issue Type: Test
>  Components: Build
>Affects Versions: 2.4.0
> Environment: os: rhel 7.6
> arch: ppc64le
>Reporter: AK97
>Priority: Major
>
> I have been trying to build the Apache Spark on rhel_7.6/ppc64le; however, 
> the spark-streaming-kafka-0-10_2.11 test cases are failing with following 
> error :
> [ERROR] 
> /opt/spark/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala:85:
>  Symbol 'term org.eclipse' is missing from the classpath.
> This symbol is required by 'method 
> org.apache.spark.metrics.MetricsSystem.getServletHandlers'.
> Make sure that term eclipse is in your classpath and check for conflicting 
> dependencies with `-Ylog-classpath`.
> A full rebuild may help if 'MetricsSystem.class' was compiled against an 
> incompatible version of org.
> [ERROR] testUtils.sendMessages(topic, data.toArray)   
>^
> Would like some help on understanding the cause for the same . I am running 
> it on a High end VM with good connectivity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30332) When running sql query with limit catalyst throw StackOverFlow exception

2019-12-27 Thread Izek Greenfield (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004146#comment-17004146
 ] 

Izek Greenfield commented on SPARK-30332:
-

The task failed, do not get any result

> When running sql query with limit catalyst throw StackOverFlow exception 
> -
>
> Key: SPARK-30332
> URL: https://issues.apache.org/jira/browse/SPARK-30332
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: spark version 3.0.0-preview
>Reporter: Izek Greenfield
>Priority: Major
>
> Running that SQL:
> {code:sql}
> SELECT  BT_capital.asof_date,
> BT_capital.run_id,
> BT_capital.v,
> BT_capital.id,
> BT_capital.entity,
> BT_capital.level_1,
> BT_capital.level_2,
> BT_capital.level_3,
> BT_capital.level_4,
> BT_capital.level_5,
> BT_capital.level_6,
> BT_capital.path_bt_capital,
> BT_capital.line_item,
> t0.target_line_item,
> t0.line_description,
> BT_capital.col_item,
> BT_capital.rep_amount,
> root.orgUnitId,
> root.cptyId,
> root.instId,
> root.startDate,
> root.maturityDate,
> root.amount,
> root.nominalAmount,
> root.quantity,
> root.lkupAssetLiability,
> root.lkupCurrency,
> root.lkupProdType,
> root.interestResetDate,
> root.interestResetTerm,
> root.noticePeriod,
> root.historicCostAmount,
> root.dueDate,
> root.lkupResidence,
> root.lkupCountryOfUltimateRisk,
> root.lkupSector,
> root.lkupIndustry,
> root.lkupAccountingPortfolioType,
> root.lkupLoanDepositTerm,
> root.lkupFixedFloating,
> root.lkupCollateralType,
> root.lkupRiskType,
> root.lkupEligibleRefinancing,
> root.lkupHedging,
> root.lkupIsOwnIssued,
> root.lkupIsSubordinated,
> root.lkupIsQuoted,
> root.lkupIsSecuritised,
> root.lkupIsSecuritisedServiced,
> root.lkupIsSyndicated,
> root.lkupIsDeRecognised,
> root.lkupIsRenegotiated,
> root.lkupIsTransferable,
> root.lkupIsNewBusiness,
> root.lkupIsFiduciary,
> root.lkupIsNonPerforming,
> root.lkupIsInterGroup,
> root.lkupIsIntraGroup,
> root.lkupIsRediscounted,
> root.lkupIsCollateral,
> root.lkupIsExercised,
> root.lkupIsImpaired,
> root.facilityId,
> root.lkupIsOTC,
> root.lkupIsDefaulted,
> root.lkupIsSavingsPosition,
> root.lkupIsForborne,
> root.lkupIsDebtRestructuringLoan,
> root.interestRateAAR,
> root.interestRateAPRC,
> root.custom1,
> root.custom2,
> root.custom3,
> root.lkupSecuritisationType,
> root.lkupIsCashPooling,
> root.lkupIsEquityParticipationGTE10,
> root.lkupIsConvertible,
> root.lkupEconomicHedge,
> root.lkupIsNonCurrHeldForSale,
> root.lkupIsEmbeddedDerivative,
> root.lkupLoanPurpose,
> root.lkupRegulated,
> root.lkupRepaymentType,
> root.glAccount,
> root.lkupIsRecourse,
> root.lkupIsNotFullyGuaranteed,
> root.lkupImpairmentStage,
> root.lkupIsEntireAmountWrittenOff,
> root.lkupIsLowCreditRisk,
> root.lkupIsOBSWithinIFRS9,
> root.lkupIsUnderSpecialSurveillance,
> root.lkupProtection,
> root.lkupIsGeneralAllowance,
> root.lkupSectorUltimateRisk,
> root.cptyOrgUnitId,
> root.name,
> root.lkupNationality,
> root.lkupSize,
> root.lkupIsSPV,
> root.lkupIsCentralCounterparty,
> root.lkupIsMMRMFI,
> root.lkupIsKeyManagement,
> root.lkupIsOtherRelatedParty,
> root.lkupResidenceProvince,
> root.lkupIsTradingBook,
> root.entityHierarchy_entityId,
> root.entityHierarchy_Residence,
> root.lkupLocalCurrency,
> root.cpty_entityhierarchy_entityId,
> root.lkupRelationship,
> root.cpty_lkupRelationship,
> root.entityNationality,
> root.lkupRepCurrency,
> root.startDateFinancialYear,
> root.numEmployees,
> root.numEmployeesTotal,
> root.collateralAmount,
> root.guaranteeAmount,
> root.impairmentSpecificIndividual,
> root.impairmentSpecificCollective,
> root.impairmentGeneral,
> root.creditRiskAmount,
> root.provisionSpecificIndividual,
> root.provisionSpecificCollective,
> root.provisionGeneral,
> root.writeOffAmount,
> root.interest,
> root.fairValueAmount,
> root.grossCarryingAmount,
> root.carryingAmount,
> root.code,
> root.lkupInstrumentType,
> root.price,
> root.amountAtIssue,
> root.yield,
> root.totalFacilityAmount,
> root.facility_rate,
> root.spec_indiv_est,
> root.spec_coll_est,
> root.coll_inc_loss,
> root.impairment_amount,
> root.provision_amount,
> root.accumulated_impairment,
> root.exclusionFlag,
> root.lkupIsHoldingCompany,
> root.instrument_startDate,
> root.entityResidence,
> fxRate.enumerator,
> fxRate.lkupFromCurrency,
> fxRate.rate,
> fxRate.custom1,
> fxRate.custom2,
> fxRate.custom3,
> GB_position.lkupIsECGDGuaranteed,
> GB_position.lkupIsMultiAcctOffsetMortgage,
> GB_position.lkupIsIndexLinked,
> GB_position.lkupIsRetail,
> GB_position.lkupCollateralLocation,
> GB_position.percentAboveBBR,
> GB_position.lkupIsMoreInArrears,
> GB_position.lkupIsArrearsCapitalised,
> GB_position.lkupCollateralPossession,
> GB_position.lkupIsLifetimeMortgage,
> GB_position

[jira] [Created] (SPARK-30365) When deploy mode is a client, why doesn't it support remote "spark.files" download?

2019-12-27 Thread wangzhun (Jira)
wangzhun created SPARK-30365:


 Summary: When deploy mode is a client, why doesn't it support 
remote "spark.files" download?
 Key: SPARK-30365
 URL: https://issues.apache.org/jira/browse/SPARK-30365
 Project: Spark
  Issue Type: Question
  Components: Spark Submit
Affects Versions: 2.3.2
 Environment: {code:java}
 ./bin/spark-submit \
--master yarn  \
--deploy-mode client \
..{code}
Reporter: wangzhun


{code:java}
// In client mode, download remote files.
var localPrimaryResource: String = null
var localJars: String = null
var localPyFiles: String = null
if (deployMode == CLIENT) {
  localPrimaryResource = Option(args.primaryResource).map {
downloadFile(_, targetDir, sparkConf, hadoopConf, secMgr)
  }.orNull
  localJars = Option(args.jars).map {
downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr)
  }.orNull
  localPyFiles = Option(args.pyFiles).map {
downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr)
  }.orNull
}
{code}
The above Spark2.3 SparkSubmit code does not download the corresponding file of 
"spark.files".

I think it is possible to download remote files locally and add them to 
classPath.

For example, can support --files configuration remote hive-site.xml



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30356) Codegen support for the function str_to_map

2019-12-27 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-30356:
---

Assignee: Kent Yao

> Codegen support for the function str_to_map
> ---
>
> Key: SPARK-30356
> URL: https://issues.apache.org/jira/browse/SPARK-30356
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> add codegen support to str_to_map



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30356) Codegen support for the function str_to_map

2019-12-27 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-30356.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27013
[https://github.com/apache/spark/pull/27013]

> Codegen support for the function str_to_map
> ---
>
> Key: SPARK-30356
> URL: https://issues.apache.org/jira/browse/SPARK-30356
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.0
>
>
> add codegen support to str_to_map



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30304) When the specified shufflemanager is incorrect, print the prompt.

2019-12-27 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-30304.
--
Resolution: Won't Fix

> When the specified shufflemanager is incorrect, print the prompt.
> -
>
> Key: SPARK-30304
> URL: https://issues.apache.org/jira/browse/SPARK-30304
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Jiaqi Li
>Priority: Trivial
>
> During the instantiation of the specified `ShuffleManager`, if the 
> configuration is wrong, whether the log can print some tips.
> before:
> {code:java}
> java.lang.ClassNotFoundException: hash
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at org.apache.spark.util.Utils$.classForName(Utils.scala:206)
>   at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:274)
>   at org.apache.spark.SparkEnv$.create(SparkEnv.scala:338)
>   at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188)
>   at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
>   at org.apache.spark.SparkContext.(SparkContext.scala:462)
>   at org.apache.spark.SparkContext.(SparkContext.scala:131)
>   at 
> org.apache.spark.SortShuffleSuite.$anonfun$new$1(SortShuffleSuite.scala:67)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29596) Task duration not updating for running tasks

2019-12-27 Thread daile (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004261#comment-17004261
 ] 

daile commented on SPARK-29596:
---

[~hyukjin.kwon]  I checked the problem and reproduced in 2.4.4 version and will 
raise PR soon 

> Task duration not updating for running tasks
> 
>
> Key: SPARK-29596
> URL: https://issues.apache.org/jira/browse/SPARK-29596
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.2
>Reporter: Bharati Jadhav
>Priority: Major
> Attachments: Screenshot_Spark_live_WebUI.png
>
>
> When looking at the task metrics for running tasks in the task table for the 
> related stage, the duration column is not updated until the task has 
> succeeded. The duration values are reported empty or 0 ms until the task has 
> completed. This is a change in behavior, from earlier versions, when the task 
> duration was continuously updated while the task was running. The missing 
> duration values can be observed for both short and long running tasks and for 
> multiple applications.
>  
> To reproduce this, one can run any code from the spark-shell and observe the 
> missing duration values for any running task. Only when the task succeeds is 
> the duration value populated in the UI.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30366) Remove Redundant Information for InMemoryTableScan in SQL UI

2019-12-27 Thread Max Thompson (Jira)
Max Thompson created SPARK-30366:


 Summary: Remove Redundant Information for InMemoryTableScan in SQL 
UI
 Key: SPARK-30366
 URL: https://issues.apache.org/jira/browse/SPARK-30366
 Project: Spark
  Issue Type: Epic
  Components: SQL, Web UI
Affects Versions: 3.0.0
Reporter: Max Thompson


All the JIRAs within this epic are follow-ups for 
https://issues.apache.org/jira/browse/SPARK-29431 
 
 This epic contains JIRAs for adding features to how InMemoryTableScan 
operators and their children are displayed in the SQL tab of the Web UI, aimed 
at removing redundant information that may confuse the user as to how the query 
was actually executed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30367) De-duplicate InMemoryTableScan cached plans in SQL UI

2019-12-27 Thread Max Thompson (Jira)
Max Thompson created SPARK-30367:


 Summary: De-duplicate InMemoryTableScan cached plans in SQL UI
 Key: SPARK-30367
 URL: https://issues.apache.org/jira/browse/SPARK-30367
 Project: Spark
  Issue Type: Improvement
  Components: SQL, Web UI
Affects Versions: 3.0.0
Reporter: Max Thompson


This is a follow-up JIRA for: https://issues.apache.org/jira/browse/SPARK-29431
 
 Currently with the change introduced by the JIRA this follows up on, duplicate 
subtrees of the query plan can be shown if multiple InMemoryTableScans read 
from the same persisted data:
 !SKF6TLGsASUVORK5CYII=! 
 
 To prevent confusion, we should add an "InMemoryRelation" node that represents 
the persisted data being read from, and use it to de-duplicate shared plans 
like so:
 
!E0KYCAIKYAALQQMQUAoIWIKQAALURMAQBoIWIKAEALEVMAAFqImAIA0ELEFACAFiKmAAC0EDEFAKCFiCkAAC1ETAEAaCFiCgBACxFTAABa6H8BsISskzFa3iEASUVORK5CYII=!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30367) De-duplicate InMemoryTableScan cached plans in SQL UI

2019-12-27 Thread Max Thompson (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Thompson updated SPARK-30367:
-
Attachment: duplicated-imr.png

> De-duplicate InMemoryTableScan cached plans in SQL UI
> -
>
> Key: SPARK-30367
> URL: https://issues.apache.org/jira/browse/SPARK-30367
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Max Thompson
>Priority: Minor
> Attachments: duplicated-imr.png
>
>
> This is a follow-up JIRA for: 
> https://issues.apache.org/jira/browse/SPARK-29431
>  
>  Currently with the change introduced by the JIRA this follows up on, 
> duplicate subtrees of the query plan can be shown if multiple 
> InMemoryTableScans read from the same persisted data:
>  !SKF6TLGsASUVORK5CYII=! 
>  
>  To prevent confusion, we should add an "InMemoryRelation" node that 
> represents the persisted data being read from, and use it to de-duplicate 
> shared plans like so:
>  
> !E0KYCAIKYAALQQMQUAoIWIKQAALURMAQBoIWIKAEALEVMAAFqImAIA0ELEFACAFiKmAAC0EDEFAKCFiCkAAC1ETAEAaCFiCgBACxFTAABa6H8BsISskzFa3iEASUVORK5CYII=!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30367) De-duplicate InMemoryTableScan cached plans in SQL UI

2019-12-27 Thread Max Thompson (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Thompson updated SPARK-30367:
-
Description: 
This is a follow-up JIRA for: https://issues.apache.org/jira/browse/SPARK-29431

Currently with the change introduced by the JIRA this follows up on, duplicate 
subtrees of the query plan can be shown if multiple InMemoryTableScans read 
from the same persisted data:
 !duplicated-imr.png!

To prevent confusion, we should add an "InMemoryRelation" node that represents 
the persisted data being read from, and use it to de-duplicate shared plans 
like so:
!deduplicated-imr.png!

  was:
This is a follow-up JIRA for: https://issues.apache.org/jira/browse/SPARK-29431
 
 Currently with the change introduced by the JIRA this follows up on, duplicate 
subtrees of the query plan can be shown if multiple InMemoryTableScans read 
from the same persisted data:
 !SKF6TLGsASUVORK5CYII=! 
 
 To prevent confusion, we should add an "InMemoryRelation" node that represents 
the persisted data being read from, and use it to de-duplicate shared plans 
like so:
 
!E0KYCAIKYAALQQMQUAoIWIKQAALURMAQBoIWIKAEALEVMAAFqImAIA0ELEFACAFiKmAAC0EDEFAKCFiCkAAC1ETAEAaCFiCgBACxFTAABa6H8BsISskzFa3iEASUVORK5CYII=!


> De-duplicate InMemoryTableScan cached plans in SQL UI
> -
>
> Key: SPARK-30367
> URL: https://issues.apache.org/jira/browse/SPARK-30367
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Max Thompson
>Priority: Minor
> Attachments: deduplicated-imr.png, duplicated-imr.png
>
>
> This is a follow-up JIRA for: 
> https://issues.apache.org/jira/browse/SPARK-29431
> Currently with the change introduced by the JIRA this follows up on, 
> duplicate subtrees of the query plan can be shown if multiple 
> InMemoryTableScans read from the same persisted data:
>  !duplicated-imr.png!
> To prevent confusion, we should add an "InMemoryRelation" node that 
> represents the persisted data being read from, and use it to de-duplicate 
> shared plans like so:
> !deduplicated-imr.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30367) De-duplicate InMemoryTableScan cached plans in SQL UI

2019-12-27 Thread Max Thompson (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Thompson updated SPARK-30367:
-
Attachment: deduplicated-imr.png

> De-duplicate InMemoryTableScan cached plans in SQL UI
> -
>
> Key: SPARK-30367
> URL: https://issues.apache.org/jira/browse/SPARK-30367
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Max Thompson
>Priority: Minor
> Attachments: deduplicated-imr.png, duplicated-imr.png
>
>
> This is a follow-up JIRA for: 
> https://issues.apache.org/jira/browse/SPARK-29431
>  
>  Currently with the change introduced by the JIRA this follows up on, 
> duplicate subtrees of the query plan can be shown if multiple 
> InMemoryTableScans read from the same persisted data:
>  !SKF6TLGsASUVORK5CYII=! 
>  
>  To prevent confusion, we should add an "InMemoryRelation" node that 
> represents the persisted data being read from, and use it to de-duplicate 
> shared plans like so:
>  
> !E0KYCAIKYAALQQMQUAoIWIKQAALURMAQBoIWIKAEALEVMAAFqImAIA0ELEFACAFiKmAAC0EDEFAKCFiCkAAC1ETAEAaCFiCgBACxFTAABa6H8BsISskzFa3iEASUVORK5CYII=!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30368) Add computed rows metric to InMemoryRelation and show in SQL UI

2019-12-27 Thread Max Thompson (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Thompson updated SPARK-30368:
-
Attachment: w-metric.png

> Add computed rows metric to InMemoryRelation and show in SQL UI
> ---
>
> Key: SPARK-30368
> URL: https://issues.apache.org/jira/browse/SPARK-30368
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Max Thompson
>Priority: Minor
> Attachments: w-metric.png
>
>
> This is a follow-up JIRA for: 
> https://issues.apache.org/jira/browse/SPARK-30367
> We should add a "number of computed rows" metric to InMemoryRelation. This 
> will show the user how many rows were computed using the InMemoryRelation's 
> cached plan (e.g. possibly zero rows if no data had to be computed, the same 
> amount as total rows read if all rows had to be computed, some subset of the 
> total rows read if some partitions had to be recomputed, etc) which would 
> help with determining how much work was done for this part of the query.
> An example with the metric where the InMemoryRelation's data was fully 
> computed from its plan:
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30368) Add computed rows metric to InMemoryRelation and show in SQL UI

2019-12-27 Thread Max Thompson (Jira)
Max Thompson created SPARK-30368:


 Summary: Add computed rows metric to InMemoryRelation and show in 
SQL UI
 Key: SPARK-30368
 URL: https://issues.apache.org/jira/browse/SPARK-30368
 Project: Spark
  Issue Type: Improvement
  Components: SQL, Web UI
Affects Versions: 3.0.0
Reporter: Max Thompson
 Attachments: w-metric.png

This is a follow-up JIRA for: https://issues.apache.org/jira/browse/SPARK-30367

We should add a "number of computed rows" metric to InMemoryRelation. This will 
show the user how many rows were computed using the InMemoryRelation's cached 
plan (e.g. possibly zero rows if no data had to be computed, the same amount as 
total rows read if all rows had to be computed, some subset of the total rows 
read if some partitions had to be recomputed, etc) which would help with 
determining how much work was done for this part of the query.

An example with the metric where the InMemoryRelation's data was fully computed 
from its plan:

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30368) Add computed rows metric to InMemoryRelation and show in SQL UI

2019-12-27 Thread Max Thompson (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Thompson updated SPARK-30368:
-
Description: 
This is a follow-up JIRA for: https://issues.apache.org/jira/browse/SPARK-30367

We should add a "number of computed rows" metric to InMemoryRelation. This will 
show the user how many rows were computed using the InMemoryRelation's cached 
plan (e.g. possibly zero rows if no data had to be computed, the same amount as 
total rows read if all rows had to be computed, some subset of the total rows 
read if some partitions had to be recomputed, etc) which would help with 
determining how much work was done for this part of the query.

An example with the metric where the InMemoryRelation's data was fully computed 
from its plan:

  !w-metric.png!

 

  was:
This is a follow-up JIRA for: https://issues.apache.org/jira/browse/SPARK-30367

We should add a "number of computed rows" metric to InMemoryRelation. This will 
show the user how many rows were computed using the InMemoryRelation's cached 
plan (e.g. possibly zero rows if no data had to be computed, the same amount as 
total rows read if all rows had to be computed, some subset of the total rows 
read if some partitions had to be recomputed, etc) which would help with 
determining how much work was done for this part of the query.

An example with the metric where the InMemoryRelation's data was fully computed 
from its plan:

 

 


> Add computed rows metric to InMemoryRelation and show in SQL UI
> ---
>
> Key: SPARK-30368
> URL: https://issues.apache.org/jira/browse/SPARK-30368
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Max Thompson
>Priority: Minor
> Attachments: w-metric.png
>
>
> This is a follow-up JIRA for: 
> https://issues.apache.org/jira/browse/SPARK-30367
> We should add a "number of computed rows" metric to InMemoryRelation. This 
> will show the user how many rows were computed using the InMemoryRelation's 
> cached plan (e.g. possibly zero rows if no data had to be computed, the same 
> amount as total rows read if all rows had to be computed, some subset of the 
> total rows read if some partitions had to be recomputed, etc) which would 
> help with determining how much work was done for this part of the query.
> An example with the metric where the InMemoryRelation's data was fully 
> computed from its plan:
>   !w-metric.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30369) Prune uncomputed children of InMemoryRelation

2019-12-27 Thread Max Thompson (Jira)
Max Thompson created SPARK-30369:


 Summary: Prune uncomputed children of InMemoryRelation
 Key: SPARK-30369
 URL: https://issues.apache.org/jira/browse/SPARK-30369
 Project: Spark
  Issue Type: Improvement
  Components: SQL, Web UI
Affects Versions: 3.0.0
Reporter: Max Thompson


This is a follow-up JIRA for: [De-duplicate InMemoryTableScan cached plans in 
SQL UI JIRA URL]
 
 Currently with the changes introduced by the JIRAs this follows up on, if a 
query persists data that is later read by another query, the uncomputed subtree 
of the plan for the persisted data will be shown:
 !bLgZ04NuAvvtXpD8SABA 
xBkAgChDnAEAiDLEGQCAKEOcAQCIMsQZAIAoQ5wBAIgyxBkAgChDnAEAiDLEGQCAKEOcAQCIMsQZAIAoQ5wBAIgyxBkAgChDnAEAiDLEGQCAKEOcAQCIMv8P7NCckWtEmyMASUVORK5CYII=!
 
 To avoid showing uncomputed subtrees in the query plan (which may become 
appreciably large in a situation such as if multiple iterative queries are run 
that each use persisted data from the last query), the uncomputed subtrees 
could be removed before rendering the query plan:
 !dGo9o 
CEEIYSEIIIcQcDCQhhBBiBgaSEEIIMQMDSQghhJiBgSSEEELMwEASQgghZmAgCSGEEDMwkIQQQogZGEhCCCHEDAwkIYQQYgYGkhBCCDEDA0kIIYSYgYEkhBBCzMBAEkIIIWZgIAkhhBAzMJCEEEKIGRhIQgghxAwMJCGEEGIGBpIQQggxAwNJCCGEmIGBJIQQQszAQBJCCCFm
 P8Bum1cUi kSH0ASUVORK5CYII=! 
 
 
 A configuration property should be added that enables this feature when set to 
true. If a user wants to see the uncomputed subtrees, they can simply disable 
the configuration property.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30369) Prune uncomputed children of InMemoryRelation

2019-12-27 Thread Max Thompson (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Thompson updated SPARK-30369:
-
Description: 
This is a follow-up JIRA for: [De-duplicate InMemoryTableScan cached plans in 
SQL UI JIRA URL]

Currently with the changes introduced by the JIRAs this follows up on, if a 
query persists data that is later read by another query, the uncomputed subtree 
of the plan for the persisted data will be shown:
!unpruned.png!  


 To avoid showing uncomputed subtrees in the query plan (which may become 
appreciably large in a situation such as if multiple iterative queries are run 
that each use persisted data from the last query), the uncomputed subtrees 
could be removed before rendering the query plan:
 

A configuration property should be added that enables this feature when set to 
true. If a user wants to see the uncomputed subtrees, they can simply disable 
the configuration property.

  was:
This is a follow-up JIRA for: [De-duplicate InMemoryTableScan cached plans in 
SQL UI JIRA URL]
 
 Currently with the changes introduced by the JIRAs this follows up on, if a 
query persists data that is later read by another query, the uncomputed subtree 
of the plan for the persisted data will be shown:
 !bLgZ04NuAvvtXpD8SABA 
xBkAgChDnAEAiDLEGQCAKEOcAQCIMsQZAIAoQ5wBAIgyxBkAgChDnAEAiDLEGQCAKEOcAQCIMsQZAIAoQ5wBAIgyxBkAgChDnAEAiDLEGQCAKEOcAQCIMv8P7NCckWtEmyMASUVORK5CYII=!
 
 To avoid showing uncomputed subtrees in the query plan (which may become 
appreciably large in a situation such as if multiple iterative queries are run 
that each use persisted data from the last query), the uncomputed subtrees 
could be removed before rendering the query plan:
 !dGo9o 
CEEIYSEIIIcQcDCQhhBBiBgaSEEIIMQMDSQghhJiBgSSEEELMwEASQgghZmAgCSGEEDMwkIQQQogZGEhCCCHEDAwkIYQQYgYGkhBCCDEDA0kIIYSYgYEkhBBCzMBAEkIIIWZgIAkhhBAzMJCEEEKIGRhIQgghxAwMJCGEEGIGBpIQQggxAwNJCCGEmIGBJIQQQszAQBJCCCFm
 P8Bum1cUi kSH0ASUVORK5CYII=! 
 
 
 A configuration property should be added that enables this feature when set to 
true. If a user wants to see the uncomputed subtrees, they can simply disable 
the configuration property.


> Prune uncomputed children of InMemoryRelation
> -
>
> Key: SPARK-30369
> URL: https://issues.apache.org/jira/browse/SPARK-30369
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Max Thompson
>Priority: Minor
> Attachments: pruned.png, unpruned.png
>
>
> This is a follow-up JIRA for: [De-duplicate InMemoryTableScan cached plans in 
> SQL UI JIRA URL]
> Currently with the changes introduced by the JIRAs this follows up on, if a 
> query persists data that is later read by another query, the uncomputed 
> subtree of the plan for the persisted data will be shown:
> !unpruned.png!  
>  To avoid showing uncomputed subtrees in the query plan (which may become 
> appreciably large in a situation such as if multiple iterative queries are 
> run that each use persisted data from the last query), the uncomputed 
> subtrees could be removed before rendering the query plan:
>  
> A configuration property should be added that enables this feature when set 
> to true. If a user wants to see the uncomputed subtrees, they can simply 
> disable the configuration property.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30369) Prune uncomputed children of InMemoryRelation

2019-12-27 Thread Max Thompson (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Thompson updated SPARK-30369:
-
Attachment: unpruned.png

> Prune uncomputed children of InMemoryRelation
> -
>
> Key: SPARK-30369
> URL: https://issues.apache.org/jira/browse/SPARK-30369
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Max Thompson
>Priority: Minor
> Attachments: pruned.png, unpruned.png
>
>
> This is a follow-up JIRA for: [De-duplicate InMemoryTableScan cached plans in 
> SQL UI JIRA URL]
>  
>  Currently with the changes introduced by the JIRAs this follows up on, if a 
> query persists data that is later read by another query, the uncomputed 
> subtree of the plan for the persisted data will be shown:
>  !bLgZ04NuAvvtXpD8SABA 
> xBkAgChDnAEAiDLEGQCAKEOcAQCIMsQZAIAoQ5wBAIgyxBkAgChDnAEAiDLEGQCAKEOcAQCIMsQZAIAoQ5wBAIgyxBkAgChDnAEAiDLEGQCAKEOcAQCIMv8P7NCckWtEmyMASUVORK5CYII=!
>  
>  To avoid showing uncomputed subtrees in the query plan (which may become 
> appreciably large in a situation such as if multiple iterative queries are 
> run that each use persisted data from the last query), the uncomputed 
> subtrees could be removed before rendering the query plan:
>  !dGo9o 
> CEEIYSEIIIcQcDCQhhBBiBgaSEEIIMQMDSQghhJiBgSSEEELMwEASQgghZmAgCSGEEDMwkIQQQogZGEhCCCHEDAwkIYQQYgYGkhBCCDEDA0kIIYSYgYEkhBBCzMBAEkIIIWZgIAkhhBAzMJCEEEKIGRhIQgghxAwMJCGEEGIGBpIQQggxAwNJCCGEmIGBJIQQQszAQBJCCCFm
>  P8Bum1cUi kSH0ASUVORK5CYII=! 
>  
>  
>  A configuration property should be added that enables this feature when set 
> to true. If a user wants to see the uncomputed subtrees, they can simply 
> disable the configuration property.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30369) Prune uncomputed children of InMemoryRelation

2019-12-27 Thread Max Thompson (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Thompson updated SPARK-30369:
-
Attachment: pruned.png

> Prune uncomputed children of InMemoryRelation
> -
>
> Key: SPARK-30369
> URL: https://issues.apache.org/jira/browse/SPARK-30369
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Max Thompson
>Priority: Minor
> Attachments: pruned.png, unpruned.png
>
>
> This is a follow-up JIRA for: [De-duplicate InMemoryTableScan cached plans in 
> SQL UI JIRA URL]
>  
>  Currently with the changes introduced by the JIRAs this follows up on, if a 
> query persists data that is later read by another query, the uncomputed 
> subtree of the plan for the persisted data will be shown:
>  !bLgZ04NuAvvtXpD8SABA 
> xBkAgChDnAEAiDLEGQCAKEOcAQCIMsQZAIAoQ5wBAIgyxBkAgChDnAEAiDLEGQCAKEOcAQCIMsQZAIAoQ5wBAIgyxBkAgChDnAEAiDLEGQCAKEOcAQCIMv8P7NCckWtEmyMASUVORK5CYII=!
>  
>  To avoid showing uncomputed subtrees in the query plan (which may become 
> appreciably large in a situation such as if multiple iterative queries are 
> run that each use persisted data from the last query), the uncomputed 
> subtrees could be removed before rendering the query plan:
>  !dGo9o 
> CEEIYSEIIIcQcDCQhhBBiBgaSEEIIMQMDSQghhJiBgSSEEELMwEASQgghZmAgCSGEEDMwkIQQQogZGEhCCCHEDAwkIYQQYgYGkhBCCDEDA0kIIYSYgYEkhBBCzMBAEkIIIWZgIAkhhBAzMJCEEEKIGRhIQgghxAwMJCGEEGIGBpIQQggxAwNJCCGEmIGBJIQQQszAQBJCCCFm
>  P8Bum1cUi kSH0ASUVORK5CYII=! 
>  
>  
>  A configuration property should be added that enables this feature when set 
> to true. If a user wants to see the uncomputed subtrees, they can simply 
> disable the configuration property.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30369) Prune uncomputed children of InMemoryRelation

2019-12-27 Thread Max Thompson (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Thompson updated SPARK-30369:
-
Description: 
This is a follow-up JIRA for: [De-duplicate InMemoryTableScan cached plans in 
SQL UI JIRA URL]

Currently with the changes introduced by the JIRAs this follows up on, if a 
query persists data that is later read by another query, the uncomputed subtree 
of the plan for the persisted data will be shown:
 !unpruned.png!  

To avoid showing uncomputed subtrees in the query plan (which may become 
appreciably large in a situation such as if multiple iterative queries are run 
that each use persisted data from the last query), the uncomputed subtrees 
could be removed before rendering the query plan:

!pruned.png!

A configuration property should be added that enables this feature when set to 
true. If a user wants to see the uncomputed subtrees, they can simply disable 
the configuration property.

  was:
This is a follow-up JIRA for: [De-duplicate InMemoryTableScan cached plans in 
SQL UI JIRA URL]

Currently with the changes introduced by the JIRAs this follows up on, if a 
query persists data that is later read by another query, the uncomputed subtree 
of the plan for the persisted data will be shown:
!unpruned.png!  


 To avoid showing uncomputed subtrees in the query plan (which may become 
appreciably large in a situation such as if multiple iterative queries are run 
that each use persisted data from the last query), the uncomputed subtrees 
could be removed before rendering the query plan:
 

A configuration property should be added that enables this feature when set to 
true. If a user wants to see the uncomputed subtrees, they can simply disable 
the configuration property.


> Prune uncomputed children of InMemoryRelation
> -
>
> Key: SPARK-30369
> URL: https://issues.apache.org/jira/browse/SPARK-30369
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Max Thompson
>Priority: Minor
> Attachments: pruned.png, unpruned.png
>
>
> This is a follow-up JIRA for: [De-duplicate InMemoryTableScan cached plans in 
> SQL UI JIRA URL]
> Currently with the changes introduced by the JIRAs this follows up on, if a 
> query persists data that is later read by another query, the uncomputed 
> subtree of the plan for the persisted data will be shown:
>  !unpruned.png!  
> To avoid showing uncomputed subtrees in the query plan (which may become 
> appreciably large in a situation such as if multiple iterative queries are 
> run that each use persisted data from the last query), the uncomputed 
> subtrees could be removed before rendering the query plan:
> !pruned.png!
> A configuration property should be added that enables this feature when set 
> to true. If a user wants to see the uncomputed subtrees, they can simply 
> disable the configuration property.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29013) Structurally equivalent subexpression elimination

2019-12-27 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-29013.
-
Resolution: Won't Fix

> Structurally equivalent subexpression elimination
> -
>
> Key: SPARK-29013
> URL: https://issues.apache.org/jira/browse/SPARK-29013
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: L. C. Hsieh
>Priority: Major
>
> We do semantically equivalent subexpression elimination in SparkSQL. However, 
> for some expressions that are not semantically equivalent, but structurally 
> equivalent, current subexpression elimination generates too many similar 
> functions. These functions share same computation structure but only differ 
> in input slots of current processing row.
> For such expressions, we can generate just one function, and pass in input 
> slots during runtime.
> It can reduce the length of generated code text, and save compilation time.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30366) Remove Redundant Information for InMemoryTableScan in SQL UI

2019-12-27 Thread Max Thompson (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004310#comment-17004310
 ] 

Max Thompson commented on SPARK-30366:
--

Requesting myself as the assignee for all issues under this Epic.

> Remove Redundant Information for InMemoryTableScan in SQL UI
> 
>
> Key: SPARK-30366
> URL: https://issues.apache.org/jira/browse/SPARK-30366
> Project: Spark
>  Issue Type: Epic
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Max Thompson
>Priority: Minor
>
> All the JIRAs within this epic are follow-ups for 
> https://issues.apache.org/jira/browse/SPARK-29431 
>  
>  This epic contains JIRAs for adding features to how InMemoryTableScan 
> operators and their children are displayed in the SQL tab of the Web UI, 
> aimed at removing redundant information that may confuse the user as to how 
> the query was actually executed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30369) Prune uncomputed children of InMemoryRelation

2019-12-27 Thread Max Thompson (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Thompson updated SPARK-30369:
-
Description: 
This is a follow-up JIRA for: https://issues.apache.org/jira/browse/SPARK-30367

Currently with the changes introduced by the JIRAs this follows up on, if a 
query persists data that is later read by another query, the uncomputed subtree 
of the plan for the persisted data will be shown:
 !unpruned.png!  

To avoid showing uncomputed subtrees in the query plan (which may become 
appreciably large in a situation such as if multiple iterative queries are run 
that each use persisted data from the last query), the uncomputed subtrees 
could be removed before rendering the query plan:

!pruned.png!

A configuration property should be added that enables this feature when set to 
true. If a user wants to see the uncomputed subtrees, they can simply disable 
the configuration property.

  was:
This is a follow-up JIRA for: [De-duplicate InMemoryTableScan cached plans in 
SQL UI JIRA URL]

Currently with the changes introduced by the JIRAs this follows up on, if a 
query persists data that is later read by another query, the uncomputed subtree 
of the plan for the persisted data will be shown:
 !unpruned.png!  

To avoid showing uncomputed subtrees in the query plan (which may become 
appreciably large in a situation such as if multiple iterative queries are run 
that each use persisted data from the last query), the uncomputed subtrees 
could be removed before rendering the query plan:

!pruned.png!

A configuration property should be added that enables this feature when set to 
true. If a user wants to see the uncomputed subtrees, they can simply disable 
the configuration property.


> Prune uncomputed children of InMemoryRelation
> -
>
> Key: SPARK-30369
> URL: https://issues.apache.org/jira/browse/SPARK-30369
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Max Thompson
>Priority: Minor
> Attachments: pruned.png, unpruned.png
>
>
> This is a follow-up JIRA for: 
> https://issues.apache.org/jira/browse/SPARK-30367
> Currently with the changes introduced by the JIRAs this follows up on, if a 
> query persists data that is later read by another query, the uncomputed 
> subtree of the plan for the persisted data will be shown:
>  !unpruned.png!  
> To avoid showing uncomputed subtrees in the query plan (which may become 
> appreciably large in a situation such as if multiple iterative queries are 
> run that each use persisted data from the last query), the uncomputed 
> subtrees could be removed before rendering the query plan:
> !pruned.png!
> A configuration property should be added that enables this feature when set 
> to true. If a user wants to see the uncomputed subtrees, they can simply 
> disable the configuration property.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24122) Allow automatic driver restarts on K8s

2019-12-27 Thread jinxing (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004323#comment-17004323
 ] 

jinxing commented on SPARK-24122:
-

I am all in on true native k8s support for spark. 

using job & deployment for driver pods sounds the right direction to me. 

Then how you plan to schedule executors? bare pod? Statefulset? 

> Allow automatic driver restarts on K8s
> --
>
> Key: SPARK-24122
> URL: https://issues.apache.org/jira/browse/SPARK-24122
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Oz Ben-Ami
>Priority: Minor
>  Labels: bulk-closed
>
> [~foxish]
> Right now SparkSubmit creates the driver as a bare pod, rather than a managed 
> controller like a Deployment or a StatefulSet. This means there is no way to 
> guarantee automatic restarts, eg in case a node has an issue. Note Pod 
> RestartPolicy does not apply if a node fails. A StatefulSet would allow us to 
> guarantee that, and keep the ability for executors to find the driver using 
> DNS.
> This is particularly helpful for long-running streaming workloads, where we 
> currently use {{yarn.resourcemanager.am.max-attempts}} with YARN. I can 
> confirm that Spark Streaming and Structured Streaming applications can be 
> made to recover from such a restart, with the help of checkpointing. The 
> executors will have to be started again by the driver, but this should not be 
> a problem.
> For batch processing, we could alternatively use Kubernetes {{Job}} objects, 
> which restart pods on failure but not success. For example, note the 
> semantics provided by the {{kubectl run}} 
> [command|https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#run]
>  * {{--restart=Never}}: bare Pod
>  * {{--restart=Always}}: Deployment
>  * {{--restart=OnFailure}}: Job
> https://github.com/apache-spark-on-k8s/spark/issues/288



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27764) Feature Parity between PostgreSQL and Spark

2019-12-27 Thread Takeshi Yamamuro (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004360#comment-17004360
 ] 

Takeshi Yamamuro commented on SPARK-27764:
--

ok, I'll clean up later.

> Feature Parity between PostgreSQL and Spark
> ---
>
> Key: SPARK-27764
> URL: https://issues.apache.org/jira/browse/SPARK-27764
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Major
>
> PostgreSQL is one of the most advanced open source databases. This umbrella 
> Jira is trying to track the missing features and bugs. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27878) Support ARRAY(sub-SELECT) expressions

2019-12-27 Thread Takeshi Yamamuro (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004361#comment-17004361
 ] 

Takeshi Yamamuro commented on SPARK-27878:
--

nvm, its ok just to reopen if we revisit this ;)

> Support ARRAY(sub-SELECT) expressions
> -
>
> Key: SPARK-27878
> URL: https://issues.apache.org/jira/browse/SPARK-27878
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Construct an array from the results of a subquery. In this form, the array 
> constructor is written with the key word {{ARRAY}} followed by a 
> parenthesized (not bracketed) subquery. For example:
> {code:sql}
> SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%');
>  array
> ---
>  {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31,2412,2413}
> (1 row)
> {code}
> More details:
>  
> [https://www.postgresql.org/docs/9.3/sql-expressions.html#SQL-SYNTAX-ARRAY-CONSTRUCTORS]
> [https://github.com/postgres/postgres/commit/730840c9b649a48604083270d48792915ca89233]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30370) Update SqlBase.g4 to combine namespace and database tokens.

2019-12-27 Thread Terry Kim (Jira)
Terry Kim created SPARK-30370:
-

 Summary: Update SqlBase.g4 to combine namespace and database 
tokens.
 Key: SPARK-30370
 URL: https://issues.apache.org/jira/browse/SPARK-30370
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Terry Kim


Instead of using `(database | NAMESPACE)` in the grammar, create 

namespace : NAMESPACE | DATABASE | SCHEMA;

and use it instead.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28670) [UDF] create permanent UDF does not throw Exception if jar does not exist in HDFS path or Local

2019-12-27 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-28670.
--
Fix Version/s: 3.0.0
 Assignee: Sandeep Katta
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/25399

> [UDF] create permanent UDF does not throw Exception if jar does not exist in 
> HDFS path or Local
> ---
>
> Key: SPARK-28670
> URL: https://issues.apache.org/jira/browse/SPARK-28670
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: ABHISHEK KUMAR GUPTA
>Assignee: Sandeep Katta
>Priority: Minor
> Fix For: 3.0.0
>
>
>  jdbc:hive2://10.18.18.214:23040/default> create function addm  AS 
> 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 
> 'hdfs://hacluster/user/AddDoublesUDF1.jar';
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> No rows selected (0.241 seconds)
> 0: jdbc:hive2://10.18.18.214:23040/default> create temporary function addm  
> AS 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 
> 'hdfs://hacluster/user/AddDoublesUDF1.jar';
> INFO  : converting to local hdfs://hacluster/user/AddDoublesUDF1.jar
> ERROR : Failed to read external resource 
> hdfs://hacluster/user/AddDoublesUDF1.jar
> java.lang.RuntimeException: Failed to read external resource 
> hdfs://hacluster/user/AddDoublesUDF1.jar
> at 
> org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1288)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29947) Improve ResolveRelations performance

2019-12-27 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-29947:

Summary: Improve ResolveRelations performance  (was: Improve 
ResolveRelations and ResolveTables performance)

> Improve ResolveRelations performance
> 
>
> Key: SPARK-29947
> URL: https://issues.apache.org/jira/browse/SPARK-29947
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> For SQL in SPARK-29606.  The physical plan in:
> {noformat}
> == Physical Plan ==
> *(12) HashAggregate(keys=[cmn_mtrc_summ_dt#21, rev_rollup#1279, CASE WHEN 
> (rev_rollup#1319 = rev_rollup#1279) THEN 0 ELSE 1 END#1366, CASE WHEN 
> cast(sap_category_id#24 as decimal(10,0)) IN (5,7,23,41) THEN 0 ELSE 1 
> END#1367], functions=[sum(coalesce(bid_count#34, 0)), 
> sum(coalesce(ck_trans_count#35, 0)), sum(coalesce(ended_bid_count#36, 0)), 
> sum(coalesce(ended_lstg_count#37, 0)), 
> sum(coalesce(ended_success_lstg_count#38, 0)), 
> sum(coalesce(item_sold_count#39, 0)), sum(coalesce(new_lstg_count#40, 0)), 
> sum(coalesce(gmv_us_amt#41, 0.00)), sum(coalesce(gmv_slr_lc_amt#42, 0.00)), 
> sum(CheckOverflow((promote_precision(cast(coalesce(rvnu_insrtn_fee_us_amt#46, 
> 0.00) as decimal(19,6))) + 
> promote_precision(cast(coalesce(rvnu_insrtn_crd_us_amt#50, 0.00) as 
> decimal(19,6, DecimalType(19,6), true)), 
> sum(CheckOverflow((promote_precision(cast(coalesce(rvnu_fetr_fee_us_amt#54, 
> 0.00) as decimal(19,6))) + 
> promote_precision(cast(coalesce(rvnu_fetr_crd_us_amt#58, 0.00) as 
> decimal(19,6, DecimalType(19,6), true)), 
> sum(CheckOverflow((promote_precision(cast(coalesce(rvnu_fv_fee_us_amt#62, 
> 0.00) as decimal(19,6))) + 
> promote_precision(cast(coalesce(rvnu_fv_crd_us_amt#67, 0.00) as 
> decimal(19,6, DecimalType(19,6), true)), 
> sum(CheckOverflow((promote_precision(cast(coalesce(rvnu_othr_l_fee_us_amt#72, 
> 0.00) as decimal(19,6))) + 
> promote_precision(cast(coalesce(rvnu_othr_l_crd_us_amt#76, 0.00) as 
> decimal(19,6, DecimalType(19,6), true)), 
> sum(CheckOverflow((promote_precision(cast(coalesce(rvnu_othr_nl_fee_us_amt#80,
>  0.00) as decimal(19,6))) + 
> promote_precision(cast(coalesce(rvnu_othr_nl_crd_us_amt#84, 0.00) as 
> decimal(19,6, DecimalType(19,6), true)), 
> sum(CheckOverflow((promote_precision(cast(coalesce(rvnu_slr_tools_fee_us_amt#88,
>  0.00) as decimal(19,6))) + 
> promote_precision(cast(coalesce(rvnu_slr_tools_crd_us_amt#92, 0.00) as 
> decimal(19,6, DecimalType(19,6), true)), 
> sum(coalesce(rvnu_unasgnd_us_amt#96, 0.00)), 
> sum((coalesce(rvnu_transaction_us_amt#112, 0.0) + 
> coalesce(rvnu_transaction_crd_us_amt#115, 0.0))), 
> sum((coalesce(rvnu_total_us_amt#118, 0.0) + 
> coalesce(rvnu_total_crd_us_amt#121, 0.0)))])
> +- Exchange hashpartitioning(cmn_mtrc_summ_dt#21, rev_rollup#1279, CASE WHEN 
> (rev_rollup#1319 = rev_rollup#1279) THEN 0 ELSE 1 END#1366, CASE WHEN 
> cast(sap_category_id#24 as decimal(10,0)) IN (5,7,23,41) THEN 0 ELSE 1 
> END#1367, 200), true, [id=#403]
>+- *(11) HashAggregate(keys=[cmn_mtrc_summ_dt#21, rev_rollup#1279, CASE 
> WHEN (rev_rollup#1319 = rev_rollup#1279) THEN 0 ELSE 1 END AS CASE WHEN 
> (rev_rollup#1319 = rev_rollup#1279) THEN 0 ELSE 1 END#1366, CASE WHEN 
> cast(sap_category_id#24 as decimal(10,0)) IN (5,7,23,41) THEN 0 ELSE 1 END AS 
> CASE WHEN cast(sap_category_id#24 as decimal(10,0)) IN (5,7,23,41) THEN 0 
> ELSE 1 END#1367], functions=[partial_sum(coalesce(bid_count#34, 0)), 
> partial_sum(coalesce(ck_trans_count#35, 0)), 
> partial_sum(coalesce(ended_bid_count#36, 0)), 
> partial_sum(coalesce(ended_lstg_count#37, 0)), 
> partial_sum(coalesce(ended_success_lstg_count#38, 0)), 
> partial_sum(coalesce(item_sold_count#39, 0)), 
> partial_sum(coalesce(new_lstg_count#40, 0)), 
> partial_sum(coalesce(gmv_us_amt#41, 0.00)), 
> partial_sum(coalesce(gmv_slr_lc_amt#42, 0.00)), 
> partial_sum(CheckOverflow((promote_precision(cast(coalesce(rvnu_insrtn_fee_us_amt#46,
>  0.00) as decimal(19,6))) + 
> promote_precision(cast(coalesce(rvnu_insrtn_crd_us_amt#50, 0.00) as 
> decimal(19,6, DecimalType(19,6), true)), 
> partial_sum(CheckOverflow((promote_precision(cast(coalesce(rvnu_fetr_fee_us_amt#54,
>  0.00) as decimal(19,6))) + 
> promote_precision(cast(coalesce(rvnu_fetr_crd_us_amt#58, 0.00) as 
> decimal(19,6, DecimalType(19,6), true)), 
> partial_sum(CheckOverflow((promote_precision(cast(coalesce(rvnu_fv_fee_us_amt#62,
>  0.00) as decimal(19,6))) + 
> promote_precision(cast(coalesce(rvnu_fv_crd_us_amt#67, 0.00) as 
> decimal(19,6, DecimalType(19,6), true)), 
> partial_sum(CheckOverflow((promote_precision(cast(coalesce(rvnu_othr_l_fee_us_amt#72,
>  0.00) as de