[jira] [Created] (SPARK-30364) The spark-streaming-kafka-0-10_2.11 test cases are failing on ppc64le
AK97 created SPARK-30364: Summary: The spark-streaming-kafka-0-10_2.11 test cases are failing on ppc64le Key: SPARK-30364 URL: https://issues.apache.org/jira/browse/SPARK-30364 Project: Spark Issue Type: Test Components: Build Affects Versions: 2.4.0 Environment: os: rhel 7.6 arch: ppc64le Reporter: AK97 I have been trying to build the Apache Spark on rhel_7.6/ppc64le. The spark-streaming-kafka-0-10_2.11 test cases are failing with following error : [ERROR] /opt/spark/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala:85: Symbol 'term org.eclipse' is missing from the classpath. This symbol is required by 'method org.apache.spark.metrics.MetricsSystem.getServletHandlers'. Make sure that term eclipse is in your classpath and check for conflicting dependencies with `-Ylog-classpath`. A full rebuild may help if 'MetricsSystem.class' was compiled against an incompatible version of org. [ERROR] testUtils.sendMessages(topic, data.toArray) ^ Would like some help on understanding the cause for the same . I am running it on a High end VM with good connectivity. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30364) The spark-streaming-kafka-0-10_2.11 test cases are failing on ppc64le
[ https://issues.apache.org/jira/browse/SPARK-30364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AK97 updated SPARK-30364: - Description: I have been trying to build the Apache Spark on rhel_7.6/ppc64le; however, the spark-streaming-kafka-0-10_2.11 test cases are failing with following error : [ERROR] /opt/spark/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala:85: Symbol 'term org.eclipse' is missing from the classpath. This symbol is required by 'method org.apache.spark.metrics.MetricsSystem.getServletHandlers'. Make sure that term eclipse is in your classpath and check for conflicting dependencies with `-Ylog-classpath`. A full rebuild may help if 'MetricsSystem.class' was compiled against an incompatible version of org. [ERROR] testUtils.sendMessages(topic, data.toArray) ^ Would like some help on understanding the cause for the same . I am running it on a High end VM with good connectivity. was: I have been trying to build the Apache Spark on rhel_7.6/ppc64le. The spark-streaming-kafka-0-10_2.11 test cases are failing with following error : [ERROR] /opt/spark/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala:85: Symbol 'term org.eclipse' is missing from the classpath. This symbol is required by 'method org.apache.spark.metrics.MetricsSystem.getServletHandlers'. Make sure that term eclipse is in your classpath and check for conflicting dependencies with `-Ylog-classpath`. A full rebuild may help if 'MetricsSystem.class' was compiled against an incompatible version of org. [ERROR] testUtils.sendMessages(topic, data.toArray) ^ Would like some help on understanding the cause for the same . I am running it on a High end VM with good connectivity. > The spark-streaming-kafka-0-10_2.11 test cases are failing on ppc64le > - > > Key: SPARK-30364 > URL: https://issues.apache.org/jira/browse/SPARK-30364 > Project: Spark > Issue Type: Test > Components: Build >Affects Versions: 2.4.0 > Environment: os: rhel 7.6 > arch: ppc64le >Reporter: AK97 >Priority: Major > > I have been trying to build the Apache Spark on rhel_7.6/ppc64le; however, > the spark-streaming-kafka-0-10_2.11 test cases are failing with following > error : > [ERROR] > /opt/spark/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala:85: > Symbol 'term org.eclipse' is missing from the classpath. > This symbol is required by 'method > org.apache.spark.metrics.MetricsSystem.getServletHandlers'. > Make sure that term eclipse is in your classpath and check for conflicting > dependencies with `-Ylog-classpath`. > A full rebuild may help if 'MetricsSystem.class' was compiled against an > incompatible version of org. > [ERROR] testUtils.sendMessages(topic, data.toArray) >^ > Would like some help on understanding the cause for the same . I am running > it on a High end VM with good connectivity. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30332) When running sql query with limit catalyst throw StackOverFlow exception
[ https://issues.apache.org/jira/browse/SPARK-30332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004146#comment-17004146 ] Izek Greenfield commented on SPARK-30332: - The task failed, do not get any result > When running sql query with limit catalyst throw StackOverFlow exception > - > > Key: SPARK-30332 > URL: https://issues.apache.org/jira/browse/SPARK-30332 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 > Environment: spark version 3.0.0-preview >Reporter: Izek Greenfield >Priority: Major > > Running that SQL: > {code:sql} > SELECT BT_capital.asof_date, > BT_capital.run_id, > BT_capital.v, > BT_capital.id, > BT_capital.entity, > BT_capital.level_1, > BT_capital.level_2, > BT_capital.level_3, > BT_capital.level_4, > BT_capital.level_5, > BT_capital.level_6, > BT_capital.path_bt_capital, > BT_capital.line_item, > t0.target_line_item, > t0.line_description, > BT_capital.col_item, > BT_capital.rep_amount, > root.orgUnitId, > root.cptyId, > root.instId, > root.startDate, > root.maturityDate, > root.amount, > root.nominalAmount, > root.quantity, > root.lkupAssetLiability, > root.lkupCurrency, > root.lkupProdType, > root.interestResetDate, > root.interestResetTerm, > root.noticePeriod, > root.historicCostAmount, > root.dueDate, > root.lkupResidence, > root.lkupCountryOfUltimateRisk, > root.lkupSector, > root.lkupIndustry, > root.lkupAccountingPortfolioType, > root.lkupLoanDepositTerm, > root.lkupFixedFloating, > root.lkupCollateralType, > root.lkupRiskType, > root.lkupEligibleRefinancing, > root.lkupHedging, > root.lkupIsOwnIssued, > root.lkupIsSubordinated, > root.lkupIsQuoted, > root.lkupIsSecuritised, > root.lkupIsSecuritisedServiced, > root.lkupIsSyndicated, > root.lkupIsDeRecognised, > root.lkupIsRenegotiated, > root.lkupIsTransferable, > root.lkupIsNewBusiness, > root.lkupIsFiduciary, > root.lkupIsNonPerforming, > root.lkupIsInterGroup, > root.lkupIsIntraGroup, > root.lkupIsRediscounted, > root.lkupIsCollateral, > root.lkupIsExercised, > root.lkupIsImpaired, > root.facilityId, > root.lkupIsOTC, > root.lkupIsDefaulted, > root.lkupIsSavingsPosition, > root.lkupIsForborne, > root.lkupIsDebtRestructuringLoan, > root.interestRateAAR, > root.interestRateAPRC, > root.custom1, > root.custom2, > root.custom3, > root.lkupSecuritisationType, > root.lkupIsCashPooling, > root.lkupIsEquityParticipationGTE10, > root.lkupIsConvertible, > root.lkupEconomicHedge, > root.lkupIsNonCurrHeldForSale, > root.lkupIsEmbeddedDerivative, > root.lkupLoanPurpose, > root.lkupRegulated, > root.lkupRepaymentType, > root.glAccount, > root.lkupIsRecourse, > root.lkupIsNotFullyGuaranteed, > root.lkupImpairmentStage, > root.lkupIsEntireAmountWrittenOff, > root.lkupIsLowCreditRisk, > root.lkupIsOBSWithinIFRS9, > root.lkupIsUnderSpecialSurveillance, > root.lkupProtection, > root.lkupIsGeneralAllowance, > root.lkupSectorUltimateRisk, > root.cptyOrgUnitId, > root.name, > root.lkupNationality, > root.lkupSize, > root.lkupIsSPV, > root.lkupIsCentralCounterparty, > root.lkupIsMMRMFI, > root.lkupIsKeyManagement, > root.lkupIsOtherRelatedParty, > root.lkupResidenceProvince, > root.lkupIsTradingBook, > root.entityHierarchy_entityId, > root.entityHierarchy_Residence, > root.lkupLocalCurrency, > root.cpty_entityhierarchy_entityId, > root.lkupRelationship, > root.cpty_lkupRelationship, > root.entityNationality, > root.lkupRepCurrency, > root.startDateFinancialYear, > root.numEmployees, > root.numEmployeesTotal, > root.collateralAmount, > root.guaranteeAmount, > root.impairmentSpecificIndividual, > root.impairmentSpecificCollective, > root.impairmentGeneral, > root.creditRiskAmount, > root.provisionSpecificIndividual, > root.provisionSpecificCollective, > root.provisionGeneral, > root.writeOffAmount, > root.interest, > root.fairValueAmount, > root.grossCarryingAmount, > root.carryingAmount, > root.code, > root.lkupInstrumentType, > root.price, > root.amountAtIssue, > root.yield, > root.totalFacilityAmount, > root.facility_rate, > root.spec_indiv_est, > root.spec_coll_est, > root.coll_inc_loss, > root.impairment_amount, > root.provision_amount, > root.accumulated_impairment, > root.exclusionFlag, > root.lkupIsHoldingCompany, > root.instrument_startDate, > root.entityResidence, > fxRate.enumerator, > fxRate.lkupFromCurrency, > fxRate.rate, > fxRate.custom1, > fxRate.custom2, > fxRate.custom3, > GB_position.lkupIsECGDGuaranteed, > GB_position.lkupIsMultiAcctOffsetMortgage, > GB_position.lkupIsIndexLinked, > GB_position.lkupIsRetail, > GB_position.lkupCollateralLocation, > GB_position.percentAboveBBR, > GB_position.lkupIsMoreInArrears, > GB_position.lkupIsArrearsCapitalised, > GB_position.lkupCollateralPossession, > GB_position.lkupIsLifetimeMortgage, > GB_position
[jira] [Created] (SPARK-30365) When deploy mode is a client, why doesn't it support remote "spark.files" download?
wangzhun created SPARK-30365: Summary: When deploy mode is a client, why doesn't it support remote "spark.files" download? Key: SPARK-30365 URL: https://issues.apache.org/jira/browse/SPARK-30365 Project: Spark Issue Type: Question Components: Spark Submit Affects Versions: 2.3.2 Environment: {code:java} ./bin/spark-submit \ --master yarn \ --deploy-mode client \ ..{code} Reporter: wangzhun {code:java} // In client mode, download remote files. var localPrimaryResource: String = null var localJars: String = null var localPyFiles: String = null if (deployMode == CLIENT) { localPrimaryResource = Option(args.primaryResource).map { downloadFile(_, targetDir, sparkConf, hadoopConf, secMgr) }.orNull localJars = Option(args.jars).map { downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr) }.orNull localPyFiles = Option(args.pyFiles).map { downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr) }.orNull } {code} The above Spark2.3 SparkSubmit code does not download the corresponding file of "spark.files". I think it is possible to download remote files locally and add them to classPath. For example, can support --files configuration remote hive-site.xml -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30356) Codegen support for the function str_to_map
[ https://issues.apache.org/jira/browse/SPARK-30356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-30356: --- Assignee: Kent Yao > Codegen support for the function str_to_map > --- > > Key: SPARK-30356 > URL: https://issues.apache.org/jira/browse/SPARK-30356 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > > add codegen support to str_to_map -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30356) Codegen support for the function str_to_map
[ https://issues.apache.org/jira/browse/SPARK-30356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-30356. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 27013 [https://github.com/apache/spark/pull/27013] > Codegen support for the function str_to_map > --- > > Key: SPARK-30356 > URL: https://issues.apache.org/jira/browse/SPARK-30356 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.0.0 > > > add codegen support to str_to_map -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30304) When the specified shufflemanager is incorrect, print the prompt.
[ https://issues.apache.org/jira/browse/SPARK-30304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-30304. -- Resolution: Won't Fix > When the specified shufflemanager is incorrect, print the prompt. > - > > Key: SPARK-30304 > URL: https://issues.apache.org/jira/browse/SPARK-30304 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.4 >Reporter: Jiaqi Li >Priority: Trivial > > During the instantiation of the specified `ShuffleManager`, if the > configuration is wrong, whether the log can print some tips. > before: > {code:java} > java.lang.ClassNotFoundException: hash > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.util.Utils$.classForName(Utils.scala:206) > at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:274) > at org.apache.spark.SparkEnv$.create(SparkEnv.scala:338) > at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188) > at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277) > at org.apache.spark.SparkContext.(SparkContext.scala:462) > at org.apache.spark.SparkContext.(SparkContext.scala:131) > at > org.apache.spark.SortShuffleSuite.$anonfun$new$1(SortShuffleSuite.scala:67) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29596) Task duration not updating for running tasks
[ https://issues.apache.org/jira/browse/SPARK-29596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004261#comment-17004261 ] daile commented on SPARK-29596: --- [~hyukjin.kwon] I checked the problem and reproduced in 2.4.4 version and will raise PR soon > Task duration not updating for running tasks > > > Key: SPARK-29596 > URL: https://issues.apache.org/jira/browse/SPARK-29596 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.2 >Reporter: Bharati Jadhav >Priority: Major > Attachments: Screenshot_Spark_live_WebUI.png > > > When looking at the task metrics for running tasks in the task table for the > related stage, the duration column is not updated until the task has > succeeded. The duration values are reported empty or 0 ms until the task has > completed. This is a change in behavior, from earlier versions, when the task > duration was continuously updated while the task was running. The missing > duration values can be observed for both short and long running tasks and for > multiple applications. > > To reproduce this, one can run any code from the spark-shell and observe the > missing duration values for any running task. Only when the task succeeds is > the duration value populated in the UI. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30366) Remove Redundant Information for InMemoryTableScan in SQL UI
Max Thompson created SPARK-30366: Summary: Remove Redundant Information for InMemoryTableScan in SQL UI Key: SPARK-30366 URL: https://issues.apache.org/jira/browse/SPARK-30366 Project: Spark Issue Type: Epic Components: SQL, Web UI Affects Versions: 3.0.0 Reporter: Max Thompson All the JIRAs within this epic are follow-ups for https://issues.apache.org/jira/browse/SPARK-29431 This epic contains JIRAs for adding features to how InMemoryTableScan operators and their children are displayed in the SQL tab of the Web UI, aimed at removing redundant information that may confuse the user as to how the query was actually executed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30367) De-duplicate InMemoryTableScan cached plans in SQL UI
Max Thompson created SPARK-30367: Summary: De-duplicate InMemoryTableScan cached plans in SQL UI Key: SPARK-30367 URL: https://issues.apache.org/jira/browse/SPARK-30367 Project: Spark Issue Type: Improvement Components: SQL, Web UI Affects Versions: 3.0.0 Reporter: Max Thompson This is a follow-up JIRA for: https://issues.apache.org/jira/browse/SPARK-29431 Currently with the change introduced by the JIRA this follows up on, duplicate subtrees of the query plan can be shown if multiple InMemoryTableScans read from the same persisted data: !SKF6TLGsASUVORK5CYII=! To prevent confusion, we should add an "InMemoryRelation" node that represents the persisted data being read from, and use it to de-duplicate shared plans like so: !E0KYCAIKYAALQQMQUAoIWIKQAALURMAQBoIWIKAEALEVMAAFqImAIA0ELEFACAFiKmAAC0EDEFAKCFiCkAAC1ETAEAaCFiCgBACxFTAABa6H8BsISskzFa3iEASUVORK5CYII=! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30367) De-duplicate InMemoryTableScan cached plans in SQL UI
[ https://issues.apache.org/jira/browse/SPARK-30367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Thompson updated SPARK-30367: - Attachment: duplicated-imr.png > De-duplicate InMemoryTableScan cached plans in SQL UI > - > > Key: SPARK-30367 > URL: https://issues.apache.org/jira/browse/SPARK-30367 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Max Thompson >Priority: Minor > Attachments: duplicated-imr.png > > > This is a follow-up JIRA for: > https://issues.apache.org/jira/browse/SPARK-29431 > > Currently with the change introduced by the JIRA this follows up on, > duplicate subtrees of the query plan can be shown if multiple > InMemoryTableScans read from the same persisted data: > !SKF6TLGsASUVORK5CYII=! > > To prevent confusion, we should add an "InMemoryRelation" node that > represents the persisted data being read from, and use it to de-duplicate > shared plans like so: > > !E0KYCAIKYAALQQMQUAoIWIKQAALURMAQBoIWIKAEALEVMAAFqImAIA0ELEFACAFiKmAAC0EDEFAKCFiCkAAC1ETAEAaCFiCgBACxFTAABa6H8BsISskzFa3iEASUVORK5CYII=! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30367) De-duplicate InMemoryTableScan cached plans in SQL UI
[ https://issues.apache.org/jira/browse/SPARK-30367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Thompson updated SPARK-30367: - Description: This is a follow-up JIRA for: https://issues.apache.org/jira/browse/SPARK-29431 Currently with the change introduced by the JIRA this follows up on, duplicate subtrees of the query plan can be shown if multiple InMemoryTableScans read from the same persisted data: !duplicated-imr.png! To prevent confusion, we should add an "InMemoryRelation" node that represents the persisted data being read from, and use it to de-duplicate shared plans like so: !deduplicated-imr.png! was: This is a follow-up JIRA for: https://issues.apache.org/jira/browse/SPARK-29431 Currently with the change introduced by the JIRA this follows up on, duplicate subtrees of the query plan can be shown if multiple InMemoryTableScans read from the same persisted data: !SKF6TLGsASUVORK5CYII=! To prevent confusion, we should add an "InMemoryRelation" node that represents the persisted data being read from, and use it to de-duplicate shared plans like so: !E0KYCAIKYAALQQMQUAoIWIKQAALURMAQBoIWIKAEALEVMAAFqImAIA0ELEFACAFiKmAAC0EDEFAKCFiCkAAC1ETAEAaCFiCgBACxFTAABa6H8BsISskzFa3iEASUVORK5CYII=! > De-duplicate InMemoryTableScan cached plans in SQL UI > - > > Key: SPARK-30367 > URL: https://issues.apache.org/jira/browse/SPARK-30367 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Max Thompson >Priority: Minor > Attachments: deduplicated-imr.png, duplicated-imr.png > > > This is a follow-up JIRA for: > https://issues.apache.org/jira/browse/SPARK-29431 > Currently with the change introduced by the JIRA this follows up on, > duplicate subtrees of the query plan can be shown if multiple > InMemoryTableScans read from the same persisted data: > !duplicated-imr.png! > To prevent confusion, we should add an "InMemoryRelation" node that > represents the persisted data being read from, and use it to de-duplicate > shared plans like so: > !deduplicated-imr.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30367) De-duplicate InMemoryTableScan cached plans in SQL UI
[ https://issues.apache.org/jira/browse/SPARK-30367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Thompson updated SPARK-30367: - Attachment: deduplicated-imr.png > De-duplicate InMemoryTableScan cached plans in SQL UI > - > > Key: SPARK-30367 > URL: https://issues.apache.org/jira/browse/SPARK-30367 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Max Thompson >Priority: Minor > Attachments: deduplicated-imr.png, duplicated-imr.png > > > This is a follow-up JIRA for: > https://issues.apache.org/jira/browse/SPARK-29431 > > Currently with the change introduced by the JIRA this follows up on, > duplicate subtrees of the query plan can be shown if multiple > InMemoryTableScans read from the same persisted data: > !SKF6TLGsASUVORK5CYII=! > > To prevent confusion, we should add an "InMemoryRelation" node that > represents the persisted data being read from, and use it to de-duplicate > shared plans like so: > > !E0KYCAIKYAALQQMQUAoIWIKQAALURMAQBoIWIKAEALEVMAAFqImAIA0ELEFACAFiKmAAC0EDEFAKCFiCkAAC1ETAEAaCFiCgBACxFTAABa6H8BsISskzFa3iEASUVORK5CYII=! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30368) Add computed rows metric to InMemoryRelation and show in SQL UI
[ https://issues.apache.org/jira/browse/SPARK-30368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Thompson updated SPARK-30368: - Attachment: w-metric.png > Add computed rows metric to InMemoryRelation and show in SQL UI > --- > > Key: SPARK-30368 > URL: https://issues.apache.org/jira/browse/SPARK-30368 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Max Thompson >Priority: Minor > Attachments: w-metric.png > > > This is a follow-up JIRA for: > https://issues.apache.org/jira/browse/SPARK-30367 > We should add a "number of computed rows" metric to InMemoryRelation. This > will show the user how many rows were computed using the InMemoryRelation's > cached plan (e.g. possibly zero rows if no data had to be computed, the same > amount as total rows read if all rows had to be computed, some subset of the > total rows read if some partitions had to be recomputed, etc) which would > help with determining how much work was done for this part of the query. > An example with the metric where the InMemoryRelation's data was fully > computed from its plan: > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30368) Add computed rows metric to InMemoryRelation and show in SQL UI
Max Thompson created SPARK-30368: Summary: Add computed rows metric to InMemoryRelation and show in SQL UI Key: SPARK-30368 URL: https://issues.apache.org/jira/browse/SPARK-30368 Project: Spark Issue Type: Improvement Components: SQL, Web UI Affects Versions: 3.0.0 Reporter: Max Thompson Attachments: w-metric.png This is a follow-up JIRA for: https://issues.apache.org/jira/browse/SPARK-30367 We should add a "number of computed rows" metric to InMemoryRelation. This will show the user how many rows were computed using the InMemoryRelation's cached plan (e.g. possibly zero rows if no data had to be computed, the same amount as total rows read if all rows had to be computed, some subset of the total rows read if some partitions had to be recomputed, etc) which would help with determining how much work was done for this part of the query. An example with the metric where the InMemoryRelation's data was fully computed from its plan: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30368) Add computed rows metric to InMemoryRelation and show in SQL UI
[ https://issues.apache.org/jira/browse/SPARK-30368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Thompson updated SPARK-30368: - Description: This is a follow-up JIRA for: https://issues.apache.org/jira/browse/SPARK-30367 We should add a "number of computed rows" metric to InMemoryRelation. This will show the user how many rows were computed using the InMemoryRelation's cached plan (e.g. possibly zero rows if no data had to be computed, the same amount as total rows read if all rows had to be computed, some subset of the total rows read if some partitions had to be recomputed, etc) which would help with determining how much work was done for this part of the query. An example with the metric where the InMemoryRelation's data was fully computed from its plan: !w-metric.png! was: This is a follow-up JIRA for: https://issues.apache.org/jira/browse/SPARK-30367 We should add a "number of computed rows" metric to InMemoryRelation. This will show the user how many rows were computed using the InMemoryRelation's cached plan (e.g. possibly zero rows if no data had to be computed, the same amount as total rows read if all rows had to be computed, some subset of the total rows read if some partitions had to be recomputed, etc) which would help with determining how much work was done for this part of the query. An example with the metric where the InMemoryRelation's data was fully computed from its plan: > Add computed rows metric to InMemoryRelation and show in SQL UI > --- > > Key: SPARK-30368 > URL: https://issues.apache.org/jira/browse/SPARK-30368 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Max Thompson >Priority: Minor > Attachments: w-metric.png > > > This is a follow-up JIRA for: > https://issues.apache.org/jira/browse/SPARK-30367 > We should add a "number of computed rows" metric to InMemoryRelation. This > will show the user how many rows were computed using the InMemoryRelation's > cached plan (e.g. possibly zero rows if no data had to be computed, the same > amount as total rows read if all rows had to be computed, some subset of the > total rows read if some partitions had to be recomputed, etc) which would > help with determining how much work was done for this part of the query. > An example with the metric where the InMemoryRelation's data was fully > computed from its plan: > !w-metric.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30369) Prune uncomputed children of InMemoryRelation
Max Thompson created SPARK-30369: Summary: Prune uncomputed children of InMemoryRelation Key: SPARK-30369 URL: https://issues.apache.org/jira/browse/SPARK-30369 Project: Spark Issue Type: Improvement Components: SQL, Web UI Affects Versions: 3.0.0 Reporter: Max Thompson This is a follow-up JIRA for: [De-duplicate InMemoryTableScan cached plans in SQL UI JIRA URL] Currently with the changes introduced by the JIRAs this follows up on, if a query persists data that is later read by another query, the uncomputed subtree of the plan for the persisted data will be shown: !bLgZ04NuAvvtXpD8SABA xBkAgChDnAEAiDLEGQCAKEOcAQCIMsQZAIAoQ5wBAIgyxBkAgChDnAEAiDLEGQCAKEOcAQCIMsQZAIAoQ5wBAIgyxBkAgChDnAEAiDLEGQCAKEOcAQCIMv8P7NCckWtEmyMASUVORK5CYII=! To avoid showing uncomputed subtrees in the query plan (which may become appreciably large in a situation such as if multiple iterative queries are run that each use persisted data from the last query), the uncomputed subtrees could be removed before rendering the query plan: !dGo9o CEEIYSEIIIcQcDCQhhBBiBgaSEEIIMQMDSQghhJiBgSSEEELMwEASQgghZmAgCSGEEDMwkIQQQogZGEhCCCHEDAwkIYQQYgYGkhBCCDEDA0kIIYSYgYEkhBBCzMBAEkIIIWZgIAkhhBAzMJCEEEKIGRhIQgghxAwMJCGEEGIGBpIQQggxAwNJCCGEmIGBJIQQQszAQBJCCCFm P8Bum1cUi kSH0ASUVORK5CYII=! A configuration property should be added that enables this feature when set to true. If a user wants to see the uncomputed subtrees, they can simply disable the configuration property. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30369) Prune uncomputed children of InMemoryRelation
[ https://issues.apache.org/jira/browse/SPARK-30369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Thompson updated SPARK-30369: - Description: This is a follow-up JIRA for: [De-duplicate InMemoryTableScan cached plans in SQL UI JIRA URL] Currently with the changes introduced by the JIRAs this follows up on, if a query persists data that is later read by another query, the uncomputed subtree of the plan for the persisted data will be shown: !unpruned.png! To avoid showing uncomputed subtrees in the query plan (which may become appreciably large in a situation such as if multiple iterative queries are run that each use persisted data from the last query), the uncomputed subtrees could be removed before rendering the query plan: A configuration property should be added that enables this feature when set to true. If a user wants to see the uncomputed subtrees, they can simply disable the configuration property. was: This is a follow-up JIRA for: [De-duplicate InMemoryTableScan cached plans in SQL UI JIRA URL] Currently with the changes introduced by the JIRAs this follows up on, if a query persists data that is later read by another query, the uncomputed subtree of the plan for the persisted data will be shown: !bLgZ04NuAvvtXpD8SABA xBkAgChDnAEAiDLEGQCAKEOcAQCIMsQZAIAoQ5wBAIgyxBkAgChDnAEAiDLEGQCAKEOcAQCIMsQZAIAoQ5wBAIgyxBkAgChDnAEAiDLEGQCAKEOcAQCIMv8P7NCckWtEmyMASUVORK5CYII=! To avoid showing uncomputed subtrees in the query plan (which may become appreciably large in a situation such as if multiple iterative queries are run that each use persisted data from the last query), the uncomputed subtrees could be removed before rendering the query plan: !dGo9o CEEIYSEIIIcQcDCQhhBBiBgaSEEIIMQMDSQghhJiBgSSEEELMwEASQgghZmAgCSGEEDMwkIQQQogZGEhCCCHEDAwkIYQQYgYGkhBCCDEDA0kIIYSYgYEkhBBCzMBAEkIIIWZgIAkhhBAzMJCEEEKIGRhIQgghxAwMJCGEEGIGBpIQQggxAwNJCCGEmIGBJIQQQszAQBJCCCFm P8Bum1cUi kSH0ASUVORK5CYII=! A configuration property should be added that enables this feature when set to true. If a user wants to see the uncomputed subtrees, they can simply disable the configuration property. > Prune uncomputed children of InMemoryRelation > - > > Key: SPARK-30369 > URL: https://issues.apache.org/jira/browse/SPARK-30369 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Max Thompson >Priority: Minor > Attachments: pruned.png, unpruned.png > > > This is a follow-up JIRA for: [De-duplicate InMemoryTableScan cached plans in > SQL UI JIRA URL] > Currently with the changes introduced by the JIRAs this follows up on, if a > query persists data that is later read by another query, the uncomputed > subtree of the plan for the persisted data will be shown: > !unpruned.png! > To avoid showing uncomputed subtrees in the query plan (which may become > appreciably large in a situation such as if multiple iterative queries are > run that each use persisted data from the last query), the uncomputed > subtrees could be removed before rendering the query plan: > > A configuration property should be added that enables this feature when set > to true. If a user wants to see the uncomputed subtrees, they can simply > disable the configuration property. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30369) Prune uncomputed children of InMemoryRelation
[ https://issues.apache.org/jira/browse/SPARK-30369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Thompson updated SPARK-30369: - Attachment: unpruned.png > Prune uncomputed children of InMemoryRelation > - > > Key: SPARK-30369 > URL: https://issues.apache.org/jira/browse/SPARK-30369 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Max Thompson >Priority: Minor > Attachments: pruned.png, unpruned.png > > > This is a follow-up JIRA for: [De-duplicate InMemoryTableScan cached plans in > SQL UI JIRA URL] > > Currently with the changes introduced by the JIRAs this follows up on, if a > query persists data that is later read by another query, the uncomputed > subtree of the plan for the persisted data will be shown: > !bLgZ04NuAvvtXpD8SABA > xBkAgChDnAEAiDLEGQCAKEOcAQCIMsQZAIAoQ5wBAIgyxBkAgChDnAEAiDLEGQCAKEOcAQCIMsQZAIAoQ5wBAIgyxBkAgChDnAEAiDLEGQCAKEOcAQCIMv8P7NCckWtEmyMASUVORK5CYII=! > > To avoid showing uncomputed subtrees in the query plan (which may become > appreciably large in a situation such as if multiple iterative queries are > run that each use persisted data from the last query), the uncomputed > subtrees could be removed before rendering the query plan: > !dGo9o > CEEIYSEIIIcQcDCQhhBBiBgaSEEIIMQMDSQghhJiBgSSEEELMwEASQgghZmAgCSGEEDMwkIQQQogZGEhCCCHEDAwkIYQQYgYGkhBCCDEDA0kIIYSYgYEkhBBCzMBAEkIIIWZgIAkhhBAzMJCEEEKIGRhIQgghxAwMJCGEEGIGBpIQQggxAwNJCCGEmIGBJIQQQszAQBJCCCFm > P8Bum1cUi kSH0ASUVORK5CYII=! > > > A configuration property should be added that enables this feature when set > to true. If a user wants to see the uncomputed subtrees, they can simply > disable the configuration property. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30369) Prune uncomputed children of InMemoryRelation
[ https://issues.apache.org/jira/browse/SPARK-30369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Thompson updated SPARK-30369: - Attachment: pruned.png > Prune uncomputed children of InMemoryRelation > - > > Key: SPARK-30369 > URL: https://issues.apache.org/jira/browse/SPARK-30369 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Max Thompson >Priority: Minor > Attachments: pruned.png, unpruned.png > > > This is a follow-up JIRA for: [De-duplicate InMemoryTableScan cached plans in > SQL UI JIRA URL] > > Currently with the changes introduced by the JIRAs this follows up on, if a > query persists data that is later read by another query, the uncomputed > subtree of the plan for the persisted data will be shown: > !bLgZ04NuAvvtXpD8SABA > xBkAgChDnAEAiDLEGQCAKEOcAQCIMsQZAIAoQ5wBAIgyxBkAgChDnAEAiDLEGQCAKEOcAQCIMsQZAIAoQ5wBAIgyxBkAgChDnAEAiDLEGQCAKEOcAQCIMv8P7NCckWtEmyMASUVORK5CYII=! > > To avoid showing uncomputed subtrees in the query plan (which may become > appreciably large in a situation such as if multiple iterative queries are > run that each use persisted data from the last query), the uncomputed > subtrees could be removed before rendering the query plan: > !dGo9o > CEEIYSEIIIcQcDCQhhBBiBgaSEEIIMQMDSQghhJiBgSSEEELMwEASQgghZmAgCSGEEDMwkIQQQogZGEhCCCHEDAwkIYQQYgYGkhBCCDEDA0kIIYSYgYEkhBBCzMBAEkIIIWZgIAkhhBAzMJCEEEKIGRhIQgghxAwMJCGEEGIGBpIQQggxAwNJCCGEmIGBJIQQQszAQBJCCCFm > P8Bum1cUi kSH0ASUVORK5CYII=! > > > A configuration property should be added that enables this feature when set > to true. If a user wants to see the uncomputed subtrees, they can simply > disable the configuration property. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30369) Prune uncomputed children of InMemoryRelation
[ https://issues.apache.org/jira/browse/SPARK-30369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Thompson updated SPARK-30369: - Description: This is a follow-up JIRA for: [De-duplicate InMemoryTableScan cached plans in SQL UI JIRA URL] Currently with the changes introduced by the JIRAs this follows up on, if a query persists data that is later read by another query, the uncomputed subtree of the plan for the persisted data will be shown: !unpruned.png! To avoid showing uncomputed subtrees in the query plan (which may become appreciably large in a situation such as if multiple iterative queries are run that each use persisted data from the last query), the uncomputed subtrees could be removed before rendering the query plan: !pruned.png! A configuration property should be added that enables this feature when set to true. If a user wants to see the uncomputed subtrees, they can simply disable the configuration property. was: This is a follow-up JIRA for: [De-duplicate InMemoryTableScan cached plans in SQL UI JIRA URL] Currently with the changes introduced by the JIRAs this follows up on, if a query persists data that is later read by another query, the uncomputed subtree of the plan for the persisted data will be shown: !unpruned.png! To avoid showing uncomputed subtrees in the query plan (which may become appreciably large in a situation such as if multiple iterative queries are run that each use persisted data from the last query), the uncomputed subtrees could be removed before rendering the query plan: A configuration property should be added that enables this feature when set to true. If a user wants to see the uncomputed subtrees, they can simply disable the configuration property. > Prune uncomputed children of InMemoryRelation > - > > Key: SPARK-30369 > URL: https://issues.apache.org/jira/browse/SPARK-30369 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Max Thompson >Priority: Minor > Attachments: pruned.png, unpruned.png > > > This is a follow-up JIRA for: [De-duplicate InMemoryTableScan cached plans in > SQL UI JIRA URL] > Currently with the changes introduced by the JIRAs this follows up on, if a > query persists data that is later read by another query, the uncomputed > subtree of the plan for the persisted data will be shown: > !unpruned.png! > To avoid showing uncomputed subtrees in the query plan (which may become > appreciably large in a situation such as if multiple iterative queries are > run that each use persisted data from the last query), the uncomputed > subtrees could be removed before rendering the query plan: > !pruned.png! > A configuration property should be added that enables this feature when set > to true. If a user wants to see the uncomputed subtrees, they can simply > disable the configuration property. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29013) Structurally equivalent subexpression elimination
[ https://issues.apache.org/jira/browse/SPARK-29013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh resolved SPARK-29013. - Resolution: Won't Fix > Structurally equivalent subexpression elimination > - > > Key: SPARK-29013 > URL: https://issues.apache.org/jira/browse/SPARK-29013 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: L. C. Hsieh >Priority: Major > > We do semantically equivalent subexpression elimination in SparkSQL. However, > for some expressions that are not semantically equivalent, but structurally > equivalent, current subexpression elimination generates too many similar > functions. These functions share same computation structure but only differ > in input slots of current processing row. > For such expressions, we can generate just one function, and pass in input > slots during runtime. > It can reduce the length of generated code text, and save compilation time. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30366) Remove Redundant Information for InMemoryTableScan in SQL UI
[ https://issues.apache.org/jira/browse/SPARK-30366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004310#comment-17004310 ] Max Thompson commented on SPARK-30366: -- Requesting myself as the assignee for all issues under this Epic. > Remove Redundant Information for InMemoryTableScan in SQL UI > > > Key: SPARK-30366 > URL: https://issues.apache.org/jira/browse/SPARK-30366 > Project: Spark > Issue Type: Epic > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Max Thompson >Priority: Minor > > All the JIRAs within this epic are follow-ups for > https://issues.apache.org/jira/browse/SPARK-29431 > > This epic contains JIRAs for adding features to how InMemoryTableScan > operators and their children are displayed in the SQL tab of the Web UI, > aimed at removing redundant information that may confuse the user as to how > the query was actually executed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30369) Prune uncomputed children of InMemoryRelation
[ https://issues.apache.org/jira/browse/SPARK-30369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Thompson updated SPARK-30369: - Description: This is a follow-up JIRA for: https://issues.apache.org/jira/browse/SPARK-30367 Currently with the changes introduced by the JIRAs this follows up on, if a query persists data that is later read by another query, the uncomputed subtree of the plan for the persisted data will be shown: !unpruned.png! To avoid showing uncomputed subtrees in the query plan (which may become appreciably large in a situation such as if multiple iterative queries are run that each use persisted data from the last query), the uncomputed subtrees could be removed before rendering the query plan: !pruned.png! A configuration property should be added that enables this feature when set to true. If a user wants to see the uncomputed subtrees, they can simply disable the configuration property. was: This is a follow-up JIRA for: [De-duplicate InMemoryTableScan cached plans in SQL UI JIRA URL] Currently with the changes introduced by the JIRAs this follows up on, if a query persists data that is later read by another query, the uncomputed subtree of the plan for the persisted data will be shown: !unpruned.png! To avoid showing uncomputed subtrees in the query plan (which may become appreciably large in a situation such as if multiple iterative queries are run that each use persisted data from the last query), the uncomputed subtrees could be removed before rendering the query plan: !pruned.png! A configuration property should be added that enables this feature when set to true. If a user wants to see the uncomputed subtrees, they can simply disable the configuration property. > Prune uncomputed children of InMemoryRelation > - > > Key: SPARK-30369 > URL: https://issues.apache.org/jira/browse/SPARK-30369 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Max Thompson >Priority: Minor > Attachments: pruned.png, unpruned.png > > > This is a follow-up JIRA for: > https://issues.apache.org/jira/browse/SPARK-30367 > Currently with the changes introduced by the JIRAs this follows up on, if a > query persists data that is later read by another query, the uncomputed > subtree of the plan for the persisted data will be shown: > !unpruned.png! > To avoid showing uncomputed subtrees in the query plan (which may become > appreciably large in a situation such as if multiple iterative queries are > run that each use persisted data from the last query), the uncomputed > subtrees could be removed before rendering the query plan: > !pruned.png! > A configuration property should be added that enables this feature when set > to true. If a user wants to see the uncomputed subtrees, they can simply > disable the configuration property. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24122) Allow automatic driver restarts on K8s
[ https://issues.apache.org/jira/browse/SPARK-24122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004323#comment-17004323 ] jinxing commented on SPARK-24122: - I am all in on true native k8s support for spark. using job & deployment for driver pods sounds the right direction to me. Then how you plan to schedule executors? bare pod? Statefulset? > Allow automatic driver restarts on K8s > -- > > Key: SPARK-24122 > URL: https://issues.apache.org/jira/browse/SPARK-24122 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.3.0 >Reporter: Oz Ben-Ami >Priority: Minor > Labels: bulk-closed > > [~foxish] > Right now SparkSubmit creates the driver as a bare pod, rather than a managed > controller like a Deployment or a StatefulSet. This means there is no way to > guarantee automatic restarts, eg in case a node has an issue. Note Pod > RestartPolicy does not apply if a node fails. A StatefulSet would allow us to > guarantee that, and keep the ability for executors to find the driver using > DNS. > This is particularly helpful for long-running streaming workloads, where we > currently use {{yarn.resourcemanager.am.max-attempts}} with YARN. I can > confirm that Spark Streaming and Structured Streaming applications can be > made to recover from such a restart, with the help of checkpointing. The > executors will have to be started again by the driver, but this should not be > a problem. > For batch processing, we could alternatively use Kubernetes {{Job}} objects, > which restart pods on failure but not success. For example, note the > semantics provided by the {{kubectl run}} > [command|https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#run] > * {{--restart=Never}}: bare Pod > * {{--restart=Always}}: Deployment > * {{--restart=OnFailure}}: Job > https://github.com/apache-spark-on-k8s/spark/issues/288 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27764) Feature Parity between PostgreSQL and Spark
[ https://issues.apache.org/jira/browse/SPARK-27764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004360#comment-17004360 ] Takeshi Yamamuro commented on SPARK-27764: -- ok, I'll clean up later. > Feature Parity between PostgreSQL and Spark > --- > > Key: SPARK-27764 > URL: https://issues.apache.org/jira/browse/SPARK-27764 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > PostgreSQL is one of the most advanced open source databases. This umbrella > Jira is trying to track the missing features and bugs. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27878) Support ARRAY(sub-SELECT) expressions
[ https://issues.apache.org/jira/browse/SPARK-27878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004361#comment-17004361 ] Takeshi Yamamuro commented on SPARK-27878: -- nvm, its ok just to reopen if we revisit this ;) > Support ARRAY(sub-SELECT) expressions > - > > Key: SPARK-27878 > URL: https://issues.apache.org/jira/browse/SPARK-27878 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > Construct an array from the results of a subquery. In this form, the array > constructor is written with the key word {{ARRAY}} followed by a > parenthesized (not bracketed) subquery. For example: > {code:sql} > SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%'); > array > --- > {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31,2412,2413} > (1 row) > {code} > More details: > > [https://www.postgresql.org/docs/9.3/sql-expressions.html#SQL-SYNTAX-ARRAY-CONSTRUCTORS] > [https://github.com/postgres/postgres/commit/730840c9b649a48604083270d48792915ca89233] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30370) Update SqlBase.g4 to combine namespace and database tokens.
Terry Kim created SPARK-30370: - Summary: Update SqlBase.g4 to combine namespace and database tokens. Key: SPARK-30370 URL: https://issues.apache.org/jira/browse/SPARK-30370 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Terry Kim Instead of using `(database | NAMESPACE)` in the grammar, create namespace : NAMESPACE | DATABASE | SCHEMA; and use it instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28670) [UDF] create permanent UDF does not throw Exception if jar does not exist in HDFS path or Local
[ https://issues.apache.org/jira/browse/SPARK-28670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-28670. -- Fix Version/s: 3.0.0 Assignee: Sandeep Katta Resolution: Fixed Resolved by https://github.com/apache/spark/pull/25399 > [UDF] create permanent UDF does not throw Exception if jar does not exist in > HDFS path or Local > --- > > Key: SPARK-28670 > URL: https://issues.apache.org/jira/browse/SPARK-28670 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Assignee: Sandeep Katta >Priority: Minor > Fix For: 3.0.0 > > > jdbc:hive2://10.18.18.214:23040/default> create function addm AS > 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar > 'hdfs://hacluster/user/AddDoublesUDF1.jar'; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.241 seconds) > 0: jdbc:hive2://10.18.18.214:23040/default> create temporary function addm > AS 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar > 'hdfs://hacluster/user/AddDoublesUDF1.jar'; > INFO : converting to local hdfs://hacluster/user/AddDoublesUDF1.jar > ERROR : Failed to read external resource > hdfs://hacluster/user/AddDoublesUDF1.jar > java.lang.RuntimeException: Failed to read external resource > hdfs://hacluster/user/AddDoublesUDF1.jar > at > org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1288) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29947) Improve ResolveRelations performance
[ https://issues.apache.org/jira/browse/SPARK-29947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-29947: Summary: Improve ResolveRelations performance (was: Improve ResolveRelations and ResolveTables performance) > Improve ResolveRelations performance > > > Key: SPARK-29947 > URL: https://issues.apache.org/jira/browse/SPARK-29947 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > For SQL in SPARK-29606. The physical plan in: > {noformat} > == Physical Plan == > *(12) HashAggregate(keys=[cmn_mtrc_summ_dt#21, rev_rollup#1279, CASE WHEN > (rev_rollup#1319 = rev_rollup#1279) THEN 0 ELSE 1 END#1366, CASE WHEN > cast(sap_category_id#24 as decimal(10,0)) IN (5,7,23,41) THEN 0 ELSE 1 > END#1367], functions=[sum(coalesce(bid_count#34, 0)), > sum(coalesce(ck_trans_count#35, 0)), sum(coalesce(ended_bid_count#36, 0)), > sum(coalesce(ended_lstg_count#37, 0)), > sum(coalesce(ended_success_lstg_count#38, 0)), > sum(coalesce(item_sold_count#39, 0)), sum(coalesce(new_lstg_count#40, 0)), > sum(coalesce(gmv_us_amt#41, 0.00)), sum(coalesce(gmv_slr_lc_amt#42, 0.00)), > sum(CheckOverflow((promote_precision(cast(coalesce(rvnu_insrtn_fee_us_amt#46, > 0.00) as decimal(19,6))) + > promote_precision(cast(coalesce(rvnu_insrtn_crd_us_amt#50, 0.00) as > decimal(19,6, DecimalType(19,6), true)), > sum(CheckOverflow((promote_precision(cast(coalesce(rvnu_fetr_fee_us_amt#54, > 0.00) as decimal(19,6))) + > promote_precision(cast(coalesce(rvnu_fetr_crd_us_amt#58, 0.00) as > decimal(19,6, DecimalType(19,6), true)), > sum(CheckOverflow((promote_precision(cast(coalesce(rvnu_fv_fee_us_amt#62, > 0.00) as decimal(19,6))) + > promote_precision(cast(coalesce(rvnu_fv_crd_us_amt#67, 0.00) as > decimal(19,6, DecimalType(19,6), true)), > sum(CheckOverflow((promote_precision(cast(coalesce(rvnu_othr_l_fee_us_amt#72, > 0.00) as decimal(19,6))) + > promote_precision(cast(coalesce(rvnu_othr_l_crd_us_amt#76, 0.00) as > decimal(19,6, DecimalType(19,6), true)), > sum(CheckOverflow((promote_precision(cast(coalesce(rvnu_othr_nl_fee_us_amt#80, > 0.00) as decimal(19,6))) + > promote_precision(cast(coalesce(rvnu_othr_nl_crd_us_amt#84, 0.00) as > decimal(19,6, DecimalType(19,6), true)), > sum(CheckOverflow((promote_precision(cast(coalesce(rvnu_slr_tools_fee_us_amt#88, > 0.00) as decimal(19,6))) + > promote_precision(cast(coalesce(rvnu_slr_tools_crd_us_amt#92, 0.00) as > decimal(19,6, DecimalType(19,6), true)), > sum(coalesce(rvnu_unasgnd_us_amt#96, 0.00)), > sum((coalesce(rvnu_transaction_us_amt#112, 0.0) + > coalesce(rvnu_transaction_crd_us_amt#115, 0.0))), > sum((coalesce(rvnu_total_us_amt#118, 0.0) + > coalesce(rvnu_total_crd_us_amt#121, 0.0)))]) > +- Exchange hashpartitioning(cmn_mtrc_summ_dt#21, rev_rollup#1279, CASE WHEN > (rev_rollup#1319 = rev_rollup#1279) THEN 0 ELSE 1 END#1366, CASE WHEN > cast(sap_category_id#24 as decimal(10,0)) IN (5,7,23,41) THEN 0 ELSE 1 > END#1367, 200), true, [id=#403] >+- *(11) HashAggregate(keys=[cmn_mtrc_summ_dt#21, rev_rollup#1279, CASE > WHEN (rev_rollup#1319 = rev_rollup#1279) THEN 0 ELSE 1 END AS CASE WHEN > (rev_rollup#1319 = rev_rollup#1279) THEN 0 ELSE 1 END#1366, CASE WHEN > cast(sap_category_id#24 as decimal(10,0)) IN (5,7,23,41) THEN 0 ELSE 1 END AS > CASE WHEN cast(sap_category_id#24 as decimal(10,0)) IN (5,7,23,41) THEN 0 > ELSE 1 END#1367], functions=[partial_sum(coalesce(bid_count#34, 0)), > partial_sum(coalesce(ck_trans_count#35, 0)), > partial_sum(coalesce(ended_bid_count#36, 0)), > partial_sum(coalesce(ended_lstg_count#37, 0)), > partial_sum(coalesce(ended_success_lstg_count#38, 0)), > partial_sum(coalesce(item_sold_count#39, 0)), > partial_sum(coalesce(new_lstg_count#40, 0)), > partial_sum(coalesce(gmv_us_amt#41, 0.00)), > partial_sum(coalesce(gmv_slr_lc_amt#42, 0.00)), > partial_sum(CheckOverflow((promote_precision(cast(coalesce(rvnu_insrtn_fee_us_amt#46, > 0.00) as decimal(19,6))) + > promote_precision(cast(coalesce(rvnu_insrtn_crd_us_amt#50, 0.00) as > decimal(19,6, DecimalType(19,6), true)), > partial_sum(CheckOverflow((promote_precision(cast(coalesce(rvnu_fetr_fee_us_amt#54, > 0.00) as decimal(19,6))) + > promote_precision(cast(coalesce(rvnu_fetr_crd_us_amt#58, 0.00) as > decimal(19,6, DecimalType(19,6), true)), > partial_sum(CheckOverflow((promote_precision(cast(coalesce(rvnu_fv_fee_us_amt#62, > 0.00) as decimal(19,6))) + > promote_precision(cast(coalesce(rvnu_fv_crd_us_amt#67, 0.00) as > decimal(19,6, DecimalType(19,6), true)), > partial_sum(CheckOverflow((promote_precision(cast(coalesce(rvnu_othr_l_fee_us_amt#72, > 0.00) as de