[jira] [Closed] (SPARK-12095) Window function rowsBetween throws exception
[ https://issues.apache.org/jira/browse/SPARK-12095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Irakli Machabeli closed SPARK-12095. Ignore, initially I was testing on windows without hive so HiveContext was not available. > Window function rowsBetween throws exception > > > Key: SPARK-12095 > URL: https://issues.apache.org/jira/browse/SPARK-12095 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.1 >Reporter: Irakli Machabeli > > From pyspark : > windowSpec=Window.partitionBy('A', 'B').orderBy('A','B', > 'C').rowsBetween('UNBOUNDED PRECEDING','CURRENT') > Py4JError: An error occurred while calling o1107.rowsBetween. Trace: > py4j.Py4JException: Method rowsBetween([class java.lang.String, class > java.lang.Long]) does not exist > from SQL query parser fails immediately: > Py4JJavaError: An error occurred while calling o18.sql. > : java.lang.RuntimeException: [1.20] failure: ``union'' expected but `(' found > select rank() OVER (PARTITION BY c1 ORDER BY c2 ) as rank from tbl >^ > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:36) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12095) Window function rowsBetween throws exception
[ https://issues.apache.org/jira/browse/SPARK-12095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083833#comment-15083833 ] Irakli Machabeli commented on SPARK-12095: -- It is mentioned briefly in API docs "Note Window functions is only supported with HiveContext in 1.4" > Window function rowsBetween throws exception > > > Key: SPARK-12095 > URL: https://issues.apache.org/jira/browse/SPARK-12095 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.1 >Reporter: Irakli Machabeli > > From pyspark : > windowSpec=Window.partitionBy('A', 'B').orderBy('A','B', > 'C').rowsBetween('UNBOUNDED PRECEDING','CURRENT') > Py4JError: An error occurred while calling o1107.rowsBetween. Trace: > py4j.Py4JException: Method rowsBetween([class java.lang.String, class > java.lang.Long]) does not exist > from SQL query parser fails immediately: > Py4JJavaError: An error occurred while calling o18.sql. > : java.lang.RuntimeException: [1.20] failure: ``union'' expected but `(' found > select rank() OVER (PARTITION BY c1 ORDER BY c2 ) as rank from tbl >^ > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:36) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12467) Get rid of sorting in Row's constructor in pyspark
Irakli Machabeli created SPARK-12467: Summary: Get rid of sorting in Row's constructor in pyspark Key: SPARK-12467 URL: https://issues.apache.org/jira/browse/SPARK-12467 Project: Spark Issue Type: Bug Components: PySpark, SQL Affects Versions: 1.5.2 Reporter: Irakli Machabeli Priority: Minor Current implementation of Row's __new__ sorts columns by name First of all there is no obvious reason to sort, second, if one converts dataframe to rdd and than back to dataframe, order of column changes. While this is not a bug, nevetheless it makes looking at the data really inconvenient. def __new__(self, *args, **kwargs): if args and kwargs: raise ValueError("Can not use both args " "and kwargs to create Row") if args: # create row class or objects return tuple.__new__(self, args) elif kwargs: # create row objects names = sorted(kwargs.keys()) # just get rid of sorting here!!! row = tuple.__new__(self, [kwargs[n] for n in names]) row.__fields__ = names return row else: raise ValueError("No args or kwargs") -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12377) Wrong implementation for Row.__call__ in pyspark
Irakli Machabeli created SPARK-12377: Summary: Wrong implementation for Row.__call__ in pyspark Key: SPARK-12377 URL: https://issues.apache.org/jira/browse/SPARK-12377 Project: Spark Issue Type: Bug Components: PySpark, SQL Reporter: Irakli Machabeli Current code def __call__(self, *args): """create new Row object""" return _create_row(self, args) has to be def __call__(self, *args): """create new Row object""" return _create_row(self.__fields__, args) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12218) Boolean logic in sql does not work "not (A and B)" is not the same as "(not A) or (not B)"
[ https://issues.apache.org/jira/browse/SPARK-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052759#comment-15052759 ] Irakli Machabeli commented on SPARK-12218: -- The bug itself is really dangerous, it's ok if it was simply crushing , throwing exception etc but it silently produces wrong results. Imagine coding in java and you have to worry if compiler correctly interprets &&, || in if statement. that's disaster. For me this is not critical, I'm still in try out mode and can always upgrade to 1.6 but for someone who uses spark 1.5 for real job, that's really bad. > Boolean logic in sql does not work "not (A and B)" is not the same as "(not > A) or (not B)" > > > Key: SPARK-12218 > URL: https://issues.apache.org/jira/browse/SPARK-12218 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.2 >Reporter: Irakli Machabeli >Priority: Blocker > > Two identical queries produce different results > In [2]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and not( > PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', > 'PreviouslyChargedOff'))").count() > Out[2]: 18 > In [3]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and ( > not(PaymentsReceived=0) or not (ExplicitRoll in ('PreviouslyPaidOff', > 'PreviouslyChargedOff')))").count() > Out[3]: 28 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12218) Boolean logic in sql does not work "not (A and B)" is not the same as "(not A) or (not B)"
[ https://issues.apache.org/jira/browse/SPARK-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048619#comment-15048619 ] Irakli Machabeli commented on SPARK-12218: -- I'm afraid I don't really know what that means, "plan by explain(true)" Shall I type it in repl? [ https://issues.apache.org/jira/browse/SPARK-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047928#comment-15047928 ] Xiao Li commented on SPARK-12218: - Could you provide the plan by explain(true)? [~imachabeli] Thanks! "(not A) or (not B)" PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 'PreviouslyChargedOff'))").count() not(PaymentsReceived=0) or not (ExplicitRoll in ('PreviouslyPaidOff', 'PreviouslyChargedOff')))").count() -- This message was sent by Atlassian JIRA (v6.3.4#6332) > Boolean logic in sql does not work "not (A and B)" is not the same as "(not > A) or (not B)" > > > Key: SPARK-12218 > URL: https://issues.apache.org/jira/browse/SPARK-12218 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.2 >Reporter: Irakli Machabeli >Priority: Blocker > > Two identical queries produce different results > In [2]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and not( > PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', > 'PreviouslyChargedOff'))").count() > Out[2]: 18 > In [3]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and ( > not(PaymentsReceived=0) or not (ExplicitRoll in ('PreviouslyPaidOff', > 'PreviouslyChargedOff')))").count() > Out[3]: 28 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12218) Boolean logic in sql does not work "not (A and B)" is not the same as "(not A) or (not B)"
[ https://issues.apache.org/jira/browse/SPARK-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048619#comment-15048619 ] Irakli Machabeli edited comment on SPARK-12218 at 12/9/15 2:28 PM: --- Below is the explain plan. To make it clear, query that contains not (A and B) : {code} and not( PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 'PreviouslyChargedOff')) {code} produces wrong results, and query that is already expanded as (not A) or (not B) produces correct output. By the way I saw in explain plan cast(0 as double)) so I tried to change 0 => 0.0 but no difference. Physical plan looks similar: {code} 'Filter (('LoanID = 62231) && (NOT ('PaymentsReceived = 0) || NOT 'ExplicitRoll IN (PreviouslyPaidOff,PreviouslyChargedOff))) Filter ((LoanID#8588 = 62231) && (NOT (PaymentsReceived#8601 = 0.0) || NOT ExplicitRoll#8611 IN (PreviouslyPaidOff,PreviouslyChargedOff))) {code} Explain plan results: {code} In [13]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and not( PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 'PreviouslyChargedOff'))").explain(True) {code} {noformat} == Parsed Logical Plan == 'Filter (('LoanID = 62231) && NOT (('PaymentsReceived = 0) && 'ExplicitRoll IN (PreviouslyPaidOff,PreviouslyChargedOff))) Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583] ParquetRelation[file:/d:/MktLending/prp_enh1] == Analyzed Logical Plan == BorrowerRate: double, MnthRate: double, ObservationMonth: date, CycleCounter: int, LoanID: int, Loankey: string, OriginationDate: date, OriginationQuarter: string, LoanAmount: double, Term: int, LenderRate: double, ProsperRating: string, ScheduledMonthlyPaymentAmount: double, ChargeoffMonth: date, ChargeoffAmount: double, CompletedMonth: date, MonthOfLastPayment: date, PaymentsReceived: double, CollectionFees: double, PrincipalPaid: double, InterestPaid: double, LateFees: double, ServicingFees: double, RecoveryPayments: double, RecoveryPrin: double, DaysPastDue: int, PriorMonthDPD: int, ExplicitRoll: string, SummaryRoll: string, CumulPrin: double, EOMPrin: double, ScheduledPrinRemaining: double, ScheduledCumulPrin: double, ScheduledPeriodicPrin: double, BOMPrin: double, ListingNumber: int, DebtSaleMonth: int, GrossCashFromDebtSale: double, DebtSaleFee: double, NetCashToInvestorsFromDebtSale: double, OZVintage: string Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = cast(0 as double)) && ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff))) Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583] ParquetRelation[file:/d:/MktLending/prp_enh1] == Optimized Logical Plan == Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = 0.0) && ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
[jira] [Comment Edited] (SPARK-12218) Boolean logic in sql does not work "not (A and B)" is not the same as "(not A) or (not B)"
[ https://issues.apache.org/jira/browse/SPARK-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048619#comment-15048619 ] Irakli Machabeli edited comment on SPARK-12218 at 12/9/15 2:40 PM: --- Below is the explain plan. To make it clear, query that contains not (A and B) : {code} and not( PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 'PreviouslyChargedOff')) {code} produces wrong results, and query that is already expanded as (not A) or (not B) produces correct output. Physical plan look like this: {code} wrong results-- Filter ((LoanID#8629 = 62231) && NOT ((PaymentsReceived#8642 = 0.0) && ExplicitRoll#8652 IN (PreviouslyPaidOff,PreviouslyChargedOff))) correct results -- Filter ((LoanID#8803 = 62231) && (NOT (PaymentsReceived#8816 = 0.0) || NOT ExplicitRoll#8826 IN (PreviouslyPaidOff,PreviouslyChargedOff))) {code} Explain plan results: {code} Wrong: In [15]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and not( PaymentsReceived=0.0 and ExplicitRoll in ('PreviouslyPaidOff', 'PreviouslyChargedOff'))").explain(True) {code} {noformat} == Parsed Logical Plan == 'Filter (('LoanID = 62231) && NOT (('PaymentsReceived = 0.0) && 'ExplicitRoll IN (PreviouslyPaidOff,PreviouslyChargedOff))) Relation[BorrowerRate#8625,MnthRate#8626,ObservationMonth#8627,CycleCounter#8628,LoanID#8629,Loankey#8630,OriginationDate#8631,OriginationQuarter#8632,LoanAmount#8633,Term#8634,LenderRate#8635,ProsperRating#8636,ScheduledMonthlyPaymentAmount#8637,ChargeoffMonth#8638,ChargeoffAmount#8639,CompletedMonth#8640,MonthOfLastPayment#8641,PaymentsReceived#8642,CollectionFees#8643,PrincipalPaid#8644,InterestPaid#8645,LateFees#8646,ServicingFees#8647,RecoveryPayments#8648,RecoveryPrin#8649,DaysPastDue#8650,PriorMonthDPD#8651,ExplicitRoll#8652,SummaryRoll#8653,CumulPrin#8654,EOMPrin#8655,ScheduledPrinRemaining#8656,ScheduledCumulPrin#8657,ScheduledPeriodicPrin#8658,BOMPrin#8659,ListingNumber#8660,DebtSaleMonth#8661,GrossCashFromDebtSale#8662,DebtSaleFee#8663,NetCashToInvestorsFromDebtSale#8664,OZVintage#8665] ParquetRelation[file:/d:/MktLending/prp_enh1] == Analyzed Logical Plan == BorrowerRate: double, MnthRate: double, ObservationMonth: date, CycleCounter: int, LoanID: int, Loankey: string, OriginationDate: date, OriginationQuarter: string, LoanAmount: double, Term: int, LenderRate: double, ProsperRating: string, ScheduledMonthlyPaymentAmount: double, ChargeoffMonth: date, ChargeoffAmount: double, CompletedMonth: date, MonthOfLastPayment: date, PaymentsReceived: double, CollectionFees: double, PrincipalPaid: double, InterestPaid: double, LateFees: double, ServicingFees: double, RecoveryPayments: double, RecoveryPrin: double, DaysPastDue: int, PriorMonthDPD: int, ExplicitRoll: string, SummaryRoll: string, CumulPrin: double, EOMPrin: double, ScheduledPrinRemaining: double, ScheduledCumulPrin: double, ScheduledPeriodicPrin: double, BOMPrin: double, ListingNumber: int, DebtSaleMonth: int, GrossCashFromDebtSale: double, DebtSaleFee: double, NetCashToInvestorsFromDebtSale: double, OZVintage: string Filter ((LoanID#8629 = 62231) && NOT ((PaymentsReceived#8642 = cast(0.0 as double)) && ExplicitRoll#8652 IN (PreviouslyPaidOff,PreviouslyChargedOff))) Relation[BorrowerRate#8625,MnthRate#8626,ObservationMonth#8627,CycleCounter#8628,LoanID#8629,Loankey#8630,OriginationDate#8631,OriginationQuarter#8632,LoanAmount#8633,Term#8634,LenderRate#8635,ProsperRating#8636,ScheduledMonthlyPaymentAmount#8637,ChargeoffMonth#8638,ChargeoffAmount#8639,CompletedMonth#8640,MonthOfLastPayment#8641,PaymentsReceived#8642,CollectionFees#8643,PrincipalPaid#8644,InterestPaid#8645,LateFees#8646,ServicingFees#8647,RecoveryPayments#8648,RecoveryPrin#8649,DaysPastDue#8650,PriorMonthDPD#8651,ExplicitRoll#8652,SummaryRoll#8653,CumulPrin#8654,EOMPrin#8655,ScheduledPrinRemaining#8656,ScheduledCumulPrin#8657,ScheduledPeriodicPrin#8658,BOMPrin#8659,ListingNumber#8660,DebtSaleMonth#8661,GrossCashFromDebtSale#8662,DebtSaleFee#8663,NetCashToInvestorsFromDebtSale#8664,OZVintage#8665] ParquetRelation[file:/d:/MktLending/prp_enh1] == Optimized Logical Plan == Filter ((LoanID#8629 = 62231) && NOT ((PaymentsReceived#8642 = 0.0) && ExplicitRoll#8652 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
[jira] [Comment Edited] (SPARK-12218) Boolean logic in sql does not work "not (A and B)" is not the same as "(not A) or (not B)"
[ https://issues.apache.org/jira/browse/SPARK-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048619#comment-15048619 ] Irakli Machabeli edited comment on SPARK-12218 at 12/9/15 2:27 PM: --- Below is the explain plan. To make it clear, query that contains not (A and B) : {code} and not( PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 'PreviouslyChargedOff'))") {code} produces wrong results, and query that is already expanded as (not A) or (not B) produces correct output. By the way I saw in explain plan cast(0 as double)) so I tried to change 0 => 0.0 but no difference. Physical plan looks similar: {code} 'Filter (('LoanID = 62231) && (NOT ('PaymentsReceived = 0) || NOT 'ExplicitRoll IN (PreviouslyPaidOff,PreviouslyChargedOff))) Filter ((LoanID#8588 = 62231) && (NOT (PaymentsReceived#8601 = 0.0) || NOT ExplicitRoll#8611 IN (PreviouslyPaidOff,PreviouslyChargedOff))) {code} Explain plan results: {code} In [13]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and not( PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 'PreviouslyChargedOff'))").explain(True) {code} {noformat} == Parsed Logical Plan == 'Filter (('LoanID = 62231) && NOT (('PaymentsReceived = 0) && 'ExplicitRoll IN (PreviouslyPaidOff,PreviouslyChargedOff))) Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583] ParquetRelation[file:/d:/MktLending/prp_enh1] == Analyzed Logical Plan == BorrowerRate: double, MnthRate: double, ObservationMonth: date, CycleCounter: int, LoanID: int, Loankey: string, OriginationDate: date, OriginationQuarter: string, LoanAmount: double, Term: int, LenderRate: double, ProsperRating: string, ScheduledMonthlyPaymentAmount: double, ChargeoffMonth: date, ChargeoffAmount: double, CompletedMonth: date, MonthOfLastPayment: date, PaymentsReceived: double, CollectionFees: double, PrincipalPaid: double, InterestPaid: double, LateFees: double, ServicingFees: double, RecoveryPayments: double, RecoveryPrin: double, DaysPastDue: int, PriorMonthDPD: int, ExplicitRoll: string, SummaryRoll: string, CumulPrin: double, EOMPrin: double, ScheduledPrinRemaining: double, ScheduledCumulPrin: double, ScheduledPeriodicPrin: double, BOMPrin: double, ListingNumber: int, DebtSaleMonth: int, GrossCashFromDebtSale: double, DebtSaleFee: double, NetCashToInvestorsFromDebtSale: double, OZVintage: string Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = cast(0 as double)) && ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff))) Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583] ParquetRelation[file:/d:/MktLending/prp_enh1] == Optimized Logical Plan == Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = 0.0) && ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
[jira] [Comment Edited] (SPARK-12218) Boolean logic in sql does not work "not (A and B)" is not the same as "(not A) or (not B)"
[ https://issues.apache.org/jira/browse/SPARK-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048619#comment-15048619 ] Irakli Machabeli edited comment on SPARK-12218 at 12/9/15 1:47 PM: --- In [13]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and not( PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 'PreviouslyChargedOff'))").explain(True) == Parsed Logical Plan == 'Filter (('LoanID = 62231) && NOT (('PaymentsReceived = 0) && 'ExplicitRoll IN (PreviouslyPaidOff,PreviouslyChargedOff))) Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583] ParquetRelation[file:/d:/MktLending/prp_enh1] == Analyzed Logical Plan == BorrowerRate: double, MnthRate: double, ObservationMonth: date, CycleCounter: int, LoanID: int, Loankey: string, OriginationDate: date, OriginationQuarter: string, LoanAmount: double, Term: int, LenderRate: double, ProsperRating: string, ScheduledMonthlyPaymentAmount: double, ChargeoffMonth: date, ChargeoffAmount: double, CompletedMonth: date, MonthOfLastPayment: date, PaymentsReceived: double, CollectionFees: double, PrincipalPaid: double, InterestPaid: double, LateFees: double, ServicingFees: double, RecoveryPayments: double, RecoveryPrin: double, DaysPastDue: int, PriorMonthDPD: int, ExplicitRoll: string, SummaryRoll: string, CumulPrin: double, EOMPrin: double, ScheduledPrinRemaining: double, ScheduledCumulPrin: double, ScheduledPeriodicPrin: double, BOMPrin: double, ListingNumber: int, DebtSaleMonth: int, GrossCashFromDebtSale: double, DebtSaleFee: double, NetCashToInvestorsFromDebtSale: double, OZVintage: string Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = cast(0 as double)) && ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff))) Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583] ParquetRelation[file:/d:/MktLending/prp_enh1] == Optimized Logical Plan == Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = 0.0) && ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff))) Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583] ParquetRelation[file:/d:/MktLending/prp_enh1] == Physical Plan == Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = 0.0) && ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff))) Scan
[jira] [Comment Edited] (SPARK-12218) Boolean logic in sql does not work "not (A and B)" is not the same as "(not A) or (not B)"
[ https://issues.apache.org/jira/browse/SPARK-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048619#comment-15048619 ] Irakli Machabeli edited comment on SPARK-12218 at 12/9/15 2:22 PM: --- Below is the explain plan. To make it clear query that contains {code} "and not( PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 'PreviouslyChargedOff'))")" {code} produces wrong results, one that is already expanded as (not A) or (not B) produces correct output. Physical plan looks similar: {code} 'Filter (('LoanID = 62231) && (NOT ('PaymentsReceived = 0) || NOT 'ExplicitRoll IN (PreviouslyPaidOff,PreviouslyChargedOff))) Filter ((LoanID#8588 = 62231) && (NOT (PaymentsReceived#8601 = 0.0) || NOT ExplicitRoll#8611 IN (PreviouslyPaidOff,PreviouslyChargedOff))) {code} Explain plan results: {code} In [13]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and not( PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 'PreviouslyChargedOff'))").explain(True) {code} {noformat} == Parsed Logical Plan == 'Filter (('LoanID = 62231) && NOT (('PaymentsReceived = 0) && 'ExplicitRoll IN (PreviouslyPaidOff,PreviouslyChargedOff))) Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583] ParquetRelation[file:/d:/MktLending/prp_enh1] == Analyzed Logical Plan == BorrowerRate: double, MnthRate: double, ObservationMonth: date, CycleCounter: int, LoanID: int, Loankey: string, OriginationDate: date, OriginationQuarter: string, LoanAmount: double, Term: int, LenderRate: double, ProsperRating: string, ScheduledMonthlyPaymentAmount: double, ChargeoffMonth: date, ChargeoffAmount: double, CompletedMonth: date, MonthOfLastPayment: date, PaymentsReceived: double, CollectionFees: double, PrincipalPaid: double, InterestPaid: double, LateFees: double, ServicingFees: double, RecoveryPayments: double, RecoveryPrin: double, DaysPastDue: int, PriorMonthDPD: int, ExplicitRoll: string, SummaryRoll: string, CumulPrin: double, EOMPrin: double, ScheduledPrinRemaining: double, ScheduledCumulPrin: double, ScheduledPeriodicPrin: double, BOMPrin: double, ListingNumber: int, DebtSaleMonth: int, GrossCashFromDebtSale: double, DebtSaleFee: double, NetCashToInvestorsFromDebtSale: double, OZVintage: string Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = cast(0 as double)) && ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff))) Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583] ParquetRelation[file:/d:/MktLending/prp_enh1] == Optimized Logical Plan == Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = 0.0) && ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
[jira] [Comment Edited] (SPARK-12218) Boolean logic in sql does not work "not (A and B)" is not the same as "(not A) or (not B)"
[ https://issues.apache.org/jira/browse/SPARK-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048619#comment-15048619 ] Irakli Machabeli edited comment on SPARK-12218 at 12/9/15 2:25 PM: --- Below is the explain plan. To make it clear, query that contains not (A and B) : {code} "and not( PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 'PreviouslyChargedOff'))")" {code} produces wrong results, and query that is already expanded as (not A) or (not B) produces correct output. By the way I saw in explain plan cast(0 as double)) so I tried to change 0 => 0.0 but no difference. Physical plan looks similar: {code} 'Filter (('LoanID = 62231) && (NOT ('PaymentsReceived = 0) || NOT 'ExplicitRoll IN (PreviouslyPaidOff,PreviouslyChargedOff))) Filter ((LoanID#8588 = 62231) && (NOT (PaymentsReceived#8601 = 0.0) || NOT ExplicitRoll#8611 IN (PreviouslyPaidOff,PreviouslyChargedOff))) {code} Explain plan results: {code} In [13]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and not( PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 'PreviouslyChargedOff'))").explain(True) {code} {noformat} == Parsed Logical Plan == 'Filter (('LoanID = 62231) && NOT (('PaymentsReceived = 0) && 'ExplicitRoll IN (PreviouslyPaidOff,PreviouslyChargedOff))) Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583] ParquetRelation[file:/d:/MktLending/prp_enh1] == Analyzed Logical Plan == BorrowerRate: double, MnthRate: double, ObservationMonth: date, CycleCounter: int, LoanID: int, Loankey: string, OriginationDate: date, OriginationQuarter: string, LoanAmount: double, Term: int, LenderRate: double, ProsperRating: string, ScheduledMonthlyPaymentAmount: double, ChargeoffMonth: date, ChargeoffAmount: double, CompletedMonth: date, MonthOfLastPayment: date, PaymentsReceived: double, CollectionFees: double, PrincipalPaid: double, InterestPaid: double, LateFees: double, ServicingFees: double, RecoveryPayments: double, RecoveryPrin: double, DaysPastDue: int, PriorMonthDPD: int, ExplicitRoll: string, SummaryRoll: string, CumulPrin: double, EOMPrin: double, ScheduledPrinRemaining: double, ScheduledCumulPrin: double, ScheduledPeriodicPrin: double, BOMPrin: double, ListingNumber: int, DebtSaleMonth: int, GrossCashFromDebtSale: double, DebtSaleFee: double, NetCashToInvestorsFromDebtSale: double, OZVintage: string Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = cast(0 as double)) && ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff))) Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583] ParquetRelation[file:/d:/MktLending/prp_enh1] == Optimized Logical Plan == Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = 0.0) && ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
[jira] [Commented] (SPARK-12218) Boolean logic in sql does not work "not (A and B)" is not the same as "(not A) or (not B)"
[ https://issues.apache.org/jira/browse/SPARK-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049194#comment-15049194 ] Irakli Machabeli commented on SPARK-12218: -- {code} scala> val df = sqlContext.read.parquet(pathOne).where("c < 6 and not (a = 2 and b in ('1', '2'))") df: org.apache.spark.sql.DataFrame = [a: int, b: string, c: int] scala> df.explain(true) {code} {noformat} == Parsed Logical Plan == 'Filter (('c < 6) && NOT (('a = 2) && 'b IN (1,2))) Relation[a#30,b#31,c#32] ParquetRelation[file:/D:/tmp/test] == Analyzed Logical Plan == a: int, b: string, c: int Filter ((c#32 < 6) && NOT ((a#30 = 2) && b#31 IN (1,2))) Relation[a#30,b#31,c#32] ParquetRelation[file:/D:/tmp/test] == Optimized Logical Plan == Filter ((c#32 < 6) && NOT ((a#30 = 2) && b#31 IN (1,2))) Relation[a#30,b#31,c#32] ParquetRelation[file:/D:/tmp/test] == Physical Plan == Filter ((c#32 < 6) && NOT ((a#30 = 2) && b#31 IN (1,2))) Scan ParquetRelation[file:/D:/tmp/test][a#30,b#31,c#32] Code Generation: true {noformat} {code} scala> val df2 = sqlContext.read.parquet(pathOne).where("c < 6 and (not(a = 2) or not(b in ('1', '2')))") df2: org.apache.spark.sql.DataFrame = [a: int, b: string, c: int] scala> df2.explain(true) {code} {noformat} == Parsed Logical Plan == 'Filter (('c < 6) && (NOT ('a = 2) || NOT 'b IN (1,2))) Relation[a#34,b#35,c#36] ParquetRelation[file:/D:/tmp/test] == Analyzed Logical Plan == a: int, b: string, c: int Filter ((c#36 < 6) && (NOT (a#34 = 2) || NOT b#35 IN (1,2))) Relation[a#34,b#35,c#36] ParquetRelation[file:/D:/tmp/test] == Optimized Logical Plan == Filter ((c#36 < 6) && (NOT (a#34 = 2) || NOT b#35 IN (1,2))) Relation[a#34,b#35,c#36] ParquetRelation[file:/D:/tmp/test] == Physical Plan == Filter ((c#36 < 6) && (NOT (a#34 = 2) || NOT b#35 IN (1,2))) Scan ParquetRelation[file:/D:/tmp/test][a#34,b#35,c#36] Code Generation: true {noformat} > Boolean logic in sql does not work "not (A and B)" is not the same as "(not > A) or (not B)" > > > Key: SPARK-12218 > URL: https://issues.apache.org/jira/browse/SPARK-12218 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.2 >Reporter: Irakli Machabeli >Priority: Blocker > > Two identical queries produce different results > In [2]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and not( > PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', > 'PreviouslyChargedOff'))").count() > Out[2]: 18 > In [3]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and ( > not(PaymentsReceived=0) or not (ExplicitRoll in ('PreviouslyPaidOff', > 'PreviouslyChargedOff')))").count() > Out[3]: 28 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12218) Boolean logic in sql does not work "not (A and B)" is not the same as "(not A) or (not B)"
Irakli Machabeli created SPARK-12218: Summary: Boolean logic in sql does not work "not (A and B)" is not the same as "(not A) or (not B)" Key: SPARK-12218 URL: https://issues.apache.org/jira/browse/SPARK-12218 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.2 Reporter: Irakli Machabeli Priority: Blocker Two identical queries produce different results In [2]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and not( PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 'PreviouslyChargedOff'))").count() Out[2]: 18 In [3]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and ( not(PaymentsReceived=0) or not (ExplicitRoll in ('PreviouslyPaidOff', 'PreviouslyChargedOff')))").count() Out[3]: 28 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12095) Window function rowsBetween throws exception
Irakli Machabeli created SPARK-12095: Summary: Window function rowsBetween throws exception Key: SPARK-12095 URL: https://issues.apache.org/jira/browse/SPARK-12095 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.1 Reporter: Irakli Machabeli >From pyspark : windowSpec=Window.partitionBy('A', 'B').orderBy('A','B', 'C').rowsBetween('UNBOUNDED PRECEDING','CURRENT') Py4JError: An error occurred while calling o1107.rowsBetween. Trace: py4j.Py4JException: Method rowsBetween([class java.lang.String, class java.lang.Long]) does not exist from SQL query parser fails immediately: Py4JJavaError: An error occurred while calling o18.sql. : java.lang.RuntimeException: [1.20] failure: ``union'' expected but `(' found select rank() OVER (PARTITION BY c1 ORDER BY c2 ) as rank from tbl ^ at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:36) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org