[ 
https://issues.apache.org/jira/browse/SPARK-26777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766653#comment-16766653
 ] 

Jungtaek Lim edited comment on SPARK-26777 at 2/13/19 2:13 AM:
---------------------------------------------------------------

I'd rather concern about EMR vs vanilla (Apache version), and if vanilla has 
same issue (Spark 2.3.x succeeds and 2.4.0 fails) IMHO we may need to be able 
to say it's a bug.

Not everyone can spend their time to investigate the issue they face - 
sometimes they don't just want to do (then I'd rather say they need to consult 
to some commercial company providing technical service on that project) but 
sometimes they cannot to what to do to narrow down the issue because they're 
not expert. IMHO it's ideal to differentiate them (if possible, so I said 
'ideal') and let it open for the latter case to open the chance for some 
volunteers to help, or otherwise let some experts provide some evidences/hints 
to let end users be able to investigate. But that's just my 2 cents and I 
understand how many false alarming issues are being filed to Spark:  thanks for 
the amazing efforts to sort out them steadily.


was (Author: kabhwan):
I'd rather concern about EMR vs vanilla (Apache version), and if vanilla has 
same issue (Spark 2.3.x succeeds and 2.4.0 fails) IMHO we may need to be able 
to say it's a bug.

Not everyone can spend their time to investigate the issue they face - 
sometimes they don't just want to do (then I'd rather say they need to consult 
to some commercial company) but sometimes they cannot to what to do to narrow 
down the issue because they're not expert. IMHO it's ideal to differentiate 
them (if possible, so I said 'ideal') and let it open for the latter case to 
open the chance for some volunteers to help, or otherwise let some experts 
provide some evidences/hints to let end users be able to investigate. But 
that's just my 2 cents and I understand how many false alarming issues are 
being filed to Spark:  thanks for the amazing efforts to sort out them steadily.

> SQL worked in 2.3.2 and fails in 2.4.0
> --------------------------------------
>
>                 Key: SPARK-26777
>                 URL: https://issues.apache.org/jira/browse/SPARK-26777
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Yuri Budilov
>            Priority: Major
>
> Following SQL worked in Spark 2.3.2 and now fails on 2.4.0 (AWS EMR Spark)
> ------------ PySpark call below:
> spark.sql("select partition_year_utc,partition_month_utc,partition_day_utc \
> from datalake_reporting.copy_of_leads_notification \
> where partition_year_utc = (select max(partition_year_utc) from 
> datalake_reporting.copy_of_leads_notification) \
> and partition_month_utc = \
>  (select max(partition_month_utc) from 
> datalake_reporting.copy_of_leads_notification as m \
>  where \
>  m.partition_year_utc = (select max(partition_year_utc) from 
> datalake_reporting.copy_of_leads_notification)) \
>  and partition_day_utc = (select max(d.partition_day_utc) from 
> datalake_reporting.copy_of_leads_notification as d \
>  where d.partition_month_utc = \
>  (select max(m1.partition_month_utc) from 
> datalake_reporting.copy_of_leads_notification as m1 \
>  where m1.partition_year_utc = \
>  (select max(y.partition_year_utc) from 
> datalake_reporting.copy_of_leads_notification as y) \
>  ) \
>  ) \
>  order by 1 desc, 2 desc, 3 desc limit 1 ").show(1,False)
> Error: (no need for data, this is syntax).
> py4j.protocol.Py4JJavaError: An error occurred while calling o1326.showString.
> : java.lang.UnsupportedOperationException: Cannot evaluate expression: 
> scalar-subquery#4495 []
>  
> Note: all 3 columns in query are Partitioned columns - see bottom of the 
> schema)
>  
> Hive EMR AWS Schema is:
>  
> CREATE EXTERNAL TABLE `copy_of_leads_notification`(
> `message.environment.siteorigin` string, `dcpheader.dcploaddateutc` string, 
> `message.id` int, `source.properties._country` string, `message.created` 
> string, `dcpheader.generatedmessageid` string, `message.tags` bigint, 
> `source.properties._enqueuedtimeutc` string, `source.properties._leadtype` 
> string, `message.itemid` string, `message.prospect.postcode` string, 
> `message.prospect.email` string, `message.referenceid` string, 
> `message.item.year` string, `message.identifier` string, 
> `dcpheader.dcploadmonthutc` string, `message.processed` string, 
> `source.properties._tenant` string, `message.item.price` string, 
> `message.subscription.confirmresponse` boolean, `message.itemtype` string, 
> `message.prospect.lastname` string, `message.subscription.insurancequote` 
> boolean, `source.exchangename` string, 
> `message.prospect.identificationnumbers` bigint, 
> `message.environment.ipaddress` string, `dcpheader.dcploaddayutc` string, 
> `source.properties._itemtype` string, `source.properties._requesttype` 
> string, `message.item.make` string, `message.prospect.firstname` string, 
> `message.subscription.survey` boolean, `message.prospect.homephone` string, 
> `message.extendedproperties` bigint, `message.subscription.financequote` 
> boolean, `message.uniqueidentifier` string, `source.properties._id` string, 
> `dcpheader.sourcemessageguid` string, `message.requesttype` string, 
> `source.routingkey` string, `message.service` string, `message.item.model` 
> string, `message.environment.pagesource` string, `source.source` string, 
> `message.sellerid` string, `partition_date_utc` string, 
> `message.selleridentifier` string, `message.subscription.newsletter` boolean, 
> `dcpheader.dcploadyearutc` string, `message.leadtype` string, 
> `message.history` bigint, `message.callconnect.calloutcome` string, 
> `message.callconnect.datecreatedutc` string, 
> `message.callconnect.callrecordingurl` string, 
> `message.callconnect.transferoutcome` string, 
> `message.callconnect.hiderecording` boolean, 
> `message.callconnect.callstartutc` string, `message.callconnect.code` string, 
> `message.callconnect.callduration` string, `message.fraudnetinfo` string, 
> `message.callconnect.answernumber` string, `message.environment.sourcedevice` 
> string, `message.comments` string, `message.fraudinfo.servervariables` 
> bigint, `message.callconnect.servicenumber` string, 
> `message.callconnect.callid` string, `message.callconnect.voicemailurl` 
> string, `message.item.stocknumber` string, 
> `message.callconnect.answerduration` string, `message.callconnect.callendutc` 
> string, `message.item.series` string, `message.item.detailsurl` string, 
> `message.item.pricetype` string, `message.item.description` string, 
> `message.item.colour` string, `message.item.badge` string, 
> `message.item.odometer` string, `message.environment.requestheader` string, 
> `message.item.registrationnumber` string, `message.item.bodytype` string, 
> `message.item.fueltype` string, `message.item.redbookcode` string, 
> `message.item.spotid` string, `message.item.id` string, 
> `message.item.transmission` string, `message.item.vin` string, 
> `message.item.enginedescription` string, `message.prospect.mobilephone` 
> string, `message.prospect.membertrackingid` string, 
> `message.environment.username` string, `message.prospect.workphone` string, 
> `message.environment.servername` string, `message.environment.sessionid` 
> string, `message.tradein.type` string, `message.tradein.model` string, 
> `message.tradein.year` string, `message.tradein.make` string, 
> `message.tradein.kms` string, `message.fraudinfo.servertimestamp` string, 
> `message.prospect.suburb` string, `message.callconnect.username` string, 
> `message.callconnect.password` string, `message.status` string, 
> `message.tradein.colour` string, `message.prospect.address` string, 
> `message.prospect.state` string, `message.tradein.detailsurl` string, 
> `message.prospect.faxnumber` string, `message.prospect.companyname` string, 
> `message.prospect.title` string, `message.callconnect.calloutcometext` 
> string, `message.prospect.preferredcontacttime` string, 
> `message.fraudinfo.devicedata` string, 
> `message.prospect.preferredcontactmethod` string, 
> `message.assignment.assigned` string, `message.assignment.email` string, 
> `message.assignment.name` string, `message.country` string, 
> `message.financepackage.id` string, `message.financepackage.version` string)
> PARTITIONED BY ( `partition_year_utc` string, `partition_month_utc` string, 
> `partition_day_utc` string, `job_run_guid` string)
> ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION 's3://datalake/yurib_test/leads_notification'
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to