[jira] [Closed] (SPARK-12095) Window function rowsBetween throws exception

2016-01-05 Thread Irakli Machabeli (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Irakli Machabeli closed SPARK-12095.


Ignore, initially I was testing on windows without hive so HiveContext was not 
available.

> Window function rowsBetween throws exception
> 
>
> Key: SPARK-12095
> URL: https://issues.apache.org/jira/browse/SPARK-12095
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Irakli Machabeli
>
> From pyspark :
>  windowSpec=Window.partitionBy('A', 'B').orderBy('A','B', 
> 'C').rowsBetween('UNBOUNDED PRECEDING','CURRENT')
> Py4JError: An error occurred while calling o1107.rowsBetween. Trace:
> py4j.Py4JException: Method rowsBetween([class java.lang.String, class 
> java.lang.Long]) does not exist
> from SQL query parser fails immediately:
> Py4JJavaError: An error occurred while calling o18.sql.
> : java.lang.RuntimeException: [1.20] failure: ``union'' expected but `(' found
> select rank() OVER (PARTITION BY c1 ORDER BY c2 ) as rank from tbl
>^
> at scala.sys.package$.error(package.scala:27)
> at 
> org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:36)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12095) Window function rowsBetween throws exception

2016-01-05 Thread Irakli Machabeli (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083833#comment-15083833
 ] 

Irakli Machabeli commented on SPARK-12095:
--

It is mentioned briefly in API docs 
"Note Window functions is only supported with HiveContext in 1.4"

> Window function rowsBetween throws exception
> 
>
> Key: SPARK-12095
> URL: https://issues.apache.org/jira/browse/SPARK-12095
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Irakli Machabeli
>
> From pyspark :
>  windowSpec=Window.partitionBy('A', 'B').orderBy('A','B', 
> 'C').rowsBetween('UNBOUNDED PRECEDING','CURRENT')
> Py4JError: An error occurred while calling o1107.rowsBetween. Trace:
> py4j.Py4JException: Method rowsBetween([class java.lang.String, class 
> java.lang.Long]) does not exist
> from SQL query parser fails immediately:
> Py4JJavaError: An error occurred while calling o18.sql.
> : java.lang.RuntimeException: [1.20] failure: ``union'' expected but `(' found
> select rank() OVER (PARTITION BY c1 ORDER BY c2 ) as rank from tbl
>^
> at scala.sys.package$.error(package.scala:27)
> at 
> org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:36)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12467) Get rid of sorting in Row's constructor in pyspark

2015-12-21 Thread Irakli Machabeli (JIRA)
Irakli Machabeli created SPARK-12467:


 Summary: Get rid of sorting in Row's constructor in pyspark
 Key: SPARK-12467
 URL: https://issues.apache.org/jira/browse/SPARK-12467
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Affects Versions: 1.5.2
Reporter: Irakli Machabeli
Priority: Minor


Current implementation of Row's __new__ sorts columns by name
First of all there is no obvious reason to sort, second, if one converts 
dataframe to rdd and than back to dataframe, order of column changes. While 
this is not  a bug, nevetheless it makes looking at the data really 
inconvenient.



def __new__(self, *args, **kwargs):
if args and kwargs:
raise ValueError("Can not use both args "
 "and kwargs to create Row")
if args:
# create row class or objects
return tuple.__new__(self, args)

elif kwargs:
# create row objects
names = sorted(kwargs.keys()) # just get rid of sorting here!!!
row = tuple.__new__(self, [kwargs[n] for n in names])
row.__fields__ = names
return row

else:
raise ValueError("No args or kwargs")




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12377) Wrong implementation for Row.__call__ in pyspark

2015-12-16 Thread Irakli Machabeli (JIRA)
Irakli Machabeli created SPARK-12377:


 Summary: Wrong implementation for Row.__call__ in pyspark
 Key: SPARK-12377
 URL: https://issues.apache.org/jira/browse/SPARK-12377
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Reporter: Irakli Machabeli


Current code

def __call__(self, *args):
"""create new Row object"""
return _create_row(self, args)


has to be 

def __call__(self, *args):
"""create new Row object"""
return _create_row(self.__fields__, args)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12218) Boolean logic in sql does not work "not (A and B)" is not the same as "(not A) or (not B)"

2015-12-11 Thread Irakli Machabeli (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052759#comment-15052759
 ] 

Irakli Machabeli commented on SPARK-12218:
--

The bug itself is really dangerous, it's ok if it was simply crushing , 
throwing exception etc but it silently produces wrong results. 
Imagine coding in java and you have to worry if compiler correctly interprets 
&&,  || in if statement. that's disaster.
For me this is not critical, I'm still in try out mode and can always upgrade 
to 1.6 but for someone who uses spark 1.5  for real job, that's really bad.

> Boolean logic in sql does not work  "not (A and B)" is not the same as  "(not 
> A) or (not B)"
> 
>
> Key: SPARK-12218
> URL: https://issues.apache.org/jira/browse/SPARK-12218
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Irakli Machabeli
>Priority: Blocker
>
> Two identical queries produce different results
> In [2]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and not( 
> PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 
> 'PreviouslyChargedOff'))").count()
> Out[2]: 18
> In [3]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and ( 
> not(PaymentsReceived=0) or not (ExplicitRoll in ('PreviouslyPaidOff', 
> 'PreviouslyChargedOff')))").count()
> Out[3]: 28



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12218) Boolean logic in sql does not work "not (A and B)" is not the same as "(not A) or (not B)"

2015-12-09 Thread Irakli Machabeli (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048619#comment-15048619
 ] 

Irakli Machabeli commented on SPARK-12218:
--

I'm afraid I don't really know what that means, "plan by explain(true)"
Shall I type it in repl?

[
https://issues.apache.org/jira/browse/SPARK-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047928#comment-15047928
]

Xiao Li commented on SPARK-12218:
-

Could you provide the plan by explain(true)? [~imachabeli] Thanks!

"(not A) or (not B)"

PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff',
'PreviouslyChargedOff'))").count()
not(PaymentsReceived=0) or not (ExplicitRoll in ('PreviouslyPaidOff',
'PreviouslyChargedOff')))").count()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


> Boolean logic in sql does not work  "not (A and B)" is not the same as  "(not 
> A) or (not B)"
> 
>
> Key: SPARK-12218
> URL: https://issues.apache.org/jira/browse/SPARK-12218
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Irakli Machabeli
>Priority: Blocker
>
> Two identical queries produce different results
> In [2]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and not( 
> PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 
> 'PreviouslyChargedOff'))").count()
> Out[2]: 18
> In [3]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and ( 
> not(PaymentsReceived=0) or not (ExplicitRoll in ('PreviouslyPaidOff', 
> 'PreviouslyChargedOff')))").count()
> Out[3]: 28



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12218) Boolean logic in sql does not work "not (A and B)" is not the same as "(not A) or (not B)"

2015-12-09 Thread Irakli Machabeli (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048619#comment-15048619
 ] 

Irakli Machabeli edited comment on SPARK-12218 at 12/9/15 2:28 PM:
---

Below is the explain plan.
To make it clear, query that contains not (A and B) :
{code}
and not( PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 
'PreviouslyChargedOff'))
{code}  
produces wrong results, 
and query that is already expanded as (not A) or (not B) produces correct 
output.
By the way I saw in explain plan cast(0 as double)) so I tried to change 0 => 
0.0 but no difference.


Physical plan looks similar:

{code}
'Filter (('LoanID = 62231) && (NOT ('PaymentsReceived = 0) || NOT 'ExplicitRoll 
IN (PreviouslyPaidOff,PreviouslyChargedOff)))
Filter ((LoanID#8588 = 62231) && (NOT (PaymentsReceived#8601 = 0.0) || NOT 
ExplicitRoll#8611 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
{code}

Explain plan results: 

{code}
In [13]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and not( 
PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 
'PreviouslyChargedOff'))").explain(True)
{code}

{noformat}
== Parsed Logical Plan ==
'Filter (('LoanID = 62231) && NOT (('PaymentsReceived = 0) && 'ExplicitRoll IN 
(PreviouslyPaidOff,PreviouslyChargedOff)))
 
Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583]
 ParquetRelation[file:/d:/MktLending/prp_enh1]

== Analyzed Logical Plan ==
BorrowerRate: double, MnthRate: double, ObservationMonth: date, CycleCounter: 
int, LoanID: int, Loankey: string, OriginationDate: date, OriginationQuarter: 
string, LoanAmount: double, Term: int, LenderRate: double, ProsperRating: 
string, ScheduledMonthlyPaymentAmount: double, ChargeoffMonth: date, 
ChargeoffAmount: double, CompletedMonth: date, MonthOfLastPayment: date, 
PaymentsReceived: double, CollectionFees: double, PrincipalPaid: double, 
InterestPaid: double, LateFees: double, ServicingFees: double, 
RecoveryPayments: double, RecoveryPrin: double, DaysPastDue: int, 
PriorMonthDPD: int, ExplicitRoll: string, SummaryRoll: string, CumulPrin: 
double, EOMPrin: double, ScheduledPrinRemaining: double, ScheduledCumulPrin: 
double, ScheduledPeriodicPrin: double, BOMPrin: double, ListingNumber: int, 
DebtSaleMonth: int, GrossCashFromDebtSale: double, DebtSaleFee: double, 
NetCashToInvestorsFromDebtSale: double, OZVintage: string
Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = cast(0 as 
double)) && ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
 
Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583]
 ParquetRelation[file:/d:/MktLending/prp_enh1]

== Optimized Logical Plan ==
Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = 0.0) && 
ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
 

[jira] [Comment Edited] (SPARK-12218) Boolean logic in sql does not work "not (A and B)" is not the same as "(not A) or (not B)"

2015-12-09 Thread Irakli Machabeli (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048619#comment-15048619
 ] 

Irakli Machabeli edited comment on SPARK-12218 at 12/9/15 2:40 PM:
---

Below is the explain plan.
To make it clear, query that contains not (A and B) :
{code}
and not( PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 
'PreviouslyChargedOff'))
{code}  
produces wrong results, 
and query that is already expanded as (not A) or (not B) produces correct 
output.



Physical plan look like this:

{code}
wrong results-- Filter ((LoanID#8629 = 62231) && NOT ((PaymentsReceived#8642 = 
0.0) && ExplicitRoll#8652 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
correct results -- Filter ((LoanID#8803 = 62231) && (NOT (PaymentsReceived#8816 
= 0.0) || NOT ExplicitRoll#8826 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
{code}

Explain plan results: 

{code}
Wrong:
In [15]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and not( 
PaymentsReceived=0.0 and ExplicitRoll in ('PreviouslyPaidOff', 
'PreviouslyChargedOff'))").explain(True)
{code}

{noformat}
== Parsed Logical Plan ==
'Filter (('LoanID = 62231) && NOT (('PaymentsReceived = 0.0) && 'ExplicitRoll 
IN (PreviouslyPaidOff,PreviouslyChargedOff)))
 
Relation[BorrowerRate#8625,MnthRate#8626,ObservationMonth#8627,CycleCounter#8628,LoanID#8629,Loankey#8630,OriginationDate#8631,OriginationQuarter#8632,LoanAmount#8633,Term#8634,LenderRate#8635,ProsperRating#8636,ScheduledMonthlyPaymentAmount#8637,ChargeoffMonth#8638,ChargeoffAmount#8639,CompletedMonth#8640,MonthOfLastPayment#8641,PaymentsReceived#8642,CollectionFees#8643,PrincipalPaid#8644,InterestPaid#8645,LateFees#8646,ServicingFees#8647,RecoveryPayments#8648,RecoveryPrin#8649,DaysPastDue#8650,PriorMonthDPD#8651,ExplicitRoll#8652,SummaryRoll#8653,CumulPrin#8654,EOMPrin#8655,ScheduledPrinRemaining#8656,ScheduledCumulPrin#8657,ScheduledPeriodicPrin#8658,BOMPrin#8659,ListingNumber#8660,DebtSaleMonth#8661,GrossCashFromDebtSale#8662,DebtSaleFee#8663,NetCashToInvestorsFromDebtSale#8664,OZVintage#8665]
 ParquetRelation[file:/d:/MktLending/prp_enh1]

== Analyzed Logical Plan ==
BorrowerRate: double, MnthRate: double, ObservationMonth: date, CycleCounter: 
int, LoanID: int, Loankey: string, OriginationDate: date, OriginationQuarter: 
string, LoanAmount: double, Term: int, LenderRate: double, ProsperRating: 
string, ScheduledMonthlyPaymentAmount: double, ChargeoffMonth: date, 
ChargeoffAmount: double, CompletedMonth: date, MonthOfLastPayment: date, 
PaymentsReceived: double, CollectionFees: double, PrincipalPaid: double, 
InterestPaid: double, LateFees: double, ServicingFees: double, 
RecoveryPayments: double, RecoveryPrin: double, DaysPastDue: int, 
PriorMonthDPD: int, ExplicitRoll: string, SummaryRoll: string, CumulPrin: 
double, EOMPrin: double, ScheduledPrinRemaining: double, ScheduledCumulPrin: 
double, ScheduledPeriodicPrin: double, BOMPrin: double, ListingNumber: int, 
DebtSaleMonth: int, GrossCashFromDebtSale: double, DebtSaleFee: double, 
NetCashToInvestorsFromDebtSale: double, OZVintage: string
Filter ((LoanID#8629 = 62231) && NOT ((PaymentsReceived#8642 = cast(0.0 as 
double)) && ExplicitRoll#8652 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
 
Relation[BorrowerRate#8625,MnthRate#8626,ObservationMonth#8627,CycleCounter#8628,LoanID#8629,Loankey#8630,OriginationDate#8631,OriginationQuarter#8632,LoanAmount#8633,Term#8634,LenderRate#8635,ProsperRating#8636,ScheduledMonthlyPaymentAmount#8637,ChargeoffMonth#8638,ChargeoffAmount#8639,CompletedMonth#8640,MonthOfLastPayment#8641,PaymentsReceived#8642,CollectionFees#8643,PrincipalPaid#8644,InterestPaid#8645,LateFees#8646,ServicingFees#8647,RecoveryPayments#8648,RecoveryPrin#8649,DaysPastDue#8650,PriorMonthDPD#8651,ExplicitRoll#8652,SummaryRoll#8653,CumulPrin#8654,EOMPrin#8655,ScheduledPrinRemaining#8656,ScheduledCumulPrin#8657,ScheduledPeriodicPrin#8658,BOMPrin#8659,ListingNumber#8660,DebtSaleMonth#8661,GrossCashFromDebtSale#8662,DebtSaleFee#8663,NetCashToInvestorsFromDebtSale#8664,OZVintage#8665]
 ParquetRelation[file:/d:/MktLending/prp_enh1]

== Optimized Logical Plan ==
Filter ((LoanID#8629 = 62231) && NOT ((PaymentsReceived#8642 = 0.0) && 
ExplicitRoll#8652 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
 

[jira] [Comment Edited] (SPARK-12218) Boolean logic in sql does not work "not (A and B)" is not the same as "(not A) or (not B)"

2015-12-09 Thread Irakli Machabeli (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048619#comment-15048619
 ] 

Irakli Machabeli edited comment on SPARK-12218 at 12/9/15 2:27 PM:
---

Below is the explain plan.
To make it clear, query that contains not (A and B) :
{code}
and not( PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 
'PreviouslyChargedOff'))")
{code}  
produces wrong results, 
and query that is already expanded as (not A) or (not B) produces correct 
output.
By the way I saw in explain plan cast(0 as double)) so I tried to change 0 => 
0.0 but no difference.


Physical plan looks similar:

{code}
'Filter (('LoanID = 62231) && (NOT ('PaymentsReceived = 0) || NOT 'ExplicitRoll 
IN (PreviouslyPaidOff,PreviouslyChargedOff)))
Filter ((LoanID#8588 = 62231) && (NOT (PaymentsReceived#8601 = 0.0) || NOT 
ExplicitRoll#8611 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
{code}

Explain plan results: 

{code}
In [13]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and not( 
PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 
'PreviouslyChargedOff'))").explain(True)
{code}

{noformat}
== Parsed Logical Plan ==
'Filter (('LoanID = 62231) && NOT (('PaymentsReceived = 0) && 'ExplicitRoll IN 
(PreviouslyPaidOff,PreviouslyChargedOff)))
 
Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583]
 ParquetRelation[file:/d:/MktLending/prp_enh1]

== Analyzed Logical Plan ==
BorrowerRate: double, MnthRate: double, ObservationMonth: date, CycleCounter: 
int, LoanID: int, Loankey: string, OriginationDate: date, OriginationQuarter: 
string, LoanAmount: double, Term: int, LenderRate: double, ProsperRating: 
string, ScheduledMonthlyPaymentAmount: double, ChargeoffMonth: date, 
ChargeoffAmount: double, CompletedMonth: date, MonthOfLastPayment: date, 
PaymentsReceived: double, CollectionFees: double, PrincipalPaid: double, 
InterestPaid: double, LateFees: double, ServicingFees: double, 
RecoveryPayments: double, RecoveryPrin: double, DaysPastDue: int, 
PriorMonthDPD: int, ExplicitRoll: string, SummaryRoll: string, CumulPrin: 
double, EOMPrin: double, ScheduledPrinRemaining: double, ScheduledCumulPrin: 
double, ScheduledPeriodicPrin: double, BOMPrin: double, ListingNumber: int, 
DebtSaleMonth: int, GrossCashFromDebtSale: double, DebtSaleFee: double, 
NetCashToInvestorsFromDebtSale: double, OZVintage: string
Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = cast(0 as 
double)) && ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
 
Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583]
 ParquetRelation[file:/d:/MktLending/prp_enh1]

== Optimized Logical Plan ==
Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = 0.0) && 
ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
 

[jira] [Comment Edited] (SPARK-12218) Boolean logic in sql does not work "not (A and B)" is not the same as "(not A) or (not B)"

2015-12-09 Thread Irakli Machabeli (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048619#comment-15048619
 ] 

Irakli Machabeli edited comment on SPARK-12218 at 12/9/15 1:47 PM:
---

In [13]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and not( 
PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 
'PreviouslyChargedOff'))").explain(True)
== Parsed Logical Plan ==
'Filter (('LoanID = 62231) && NOT (('PaymentsReceived = 0) && 'ExplicitRoll IN 
(PreviouslyPaidOff,PreviouslyChargedOff)))
 
Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583]
 ParquetRelation[file:/d:/MktLending/prp_enh1]

== Analyzed Logical Plan ==
BorrowerRate: double, MnthRate: double, ObservationMonth: date, CycleCounter: 
int, LoanID: int, Loankey: string, OriginationDate: date, OriginationQuarter: 
string, LoanAmount: double, Term: int, LenderRate: double, ProsperRating: 
string, ScheduledMonthlyPaymentAmount: double, ChargeoffMonth: date, 
ChargeoffAmount: double, CompletedMonth: date, MonthOfLastPayment: date, 
PaymentsReceived: double, CollectionFees: double, PrincipalPaid: double, 
InterestPaid: double, LateFees: double, ServicingFees: double, 
RecoveryPayments: double, RecoveryPrin: double, DaysPastDue: int, 
PriorMonthDPD: int, ExplicitRoll: string, SummaryRoll: string, CumulPrin: 
double, EOMPrin: double, ScheduledPrinRemaining: double, ScheduledCumulPrin: 
double, ScheduledPeriodicPrin: double, BOMPrin: double, ListingNumber: int, 
DebtSaleMonth: int, GrossCashFromDebtSale: double, DebtSaleFee: double, 
NetCashToInvestorsFromDebtSale: double, OZVintage: string
Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = cast(0 as 
double)) && ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
 
Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583]
 ParquetRelation[file:/d:/MktLending/prp_enh1]

== Optimized Logical Plan ==
Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = 0.0) && 
ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
 
Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583]
 ParquetRelation[file:/d:/MktLending/prp_enh1]

== Physical Plan ==
Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = 0.0) && 
ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
 Scan 

[jira] [Comment Edited] (SPARK-12218) Boolean logic in sql does not work "not (A and B)" is not the same as "(not A) or (not B)"

2015-12-09 Thread Irakli Machabeli (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048619#comment-15048619
 ] 

Irakli Machabeli edited comment on SPARK-12218 at 12/9/15 2:22 PM:
---

Below is the explain plan.
To make it clear query that contains 
{code}
"and not( PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 
'PreviouslyChargedOff'))")"  
{code}  
produces wrong results, one that is already expanded as (not A) or (not B) 
produces correct output.


Physical plan looks similar:

{code}
'Filter (('LoanID = 62231) && (NOT ('PaymentsReceived = 0) || NOT 'ExplicitRoll 
IN (PreviouslyPaidOff,PreviouslyChargedOff)))
Filter ((LoanID#8588 = 62231) && (NOT (PaymentsReceived#8601 = 0.0) || NOT 
ExplicitRoll#8611 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
{code}

Explain plan results: 

{code}
In [13]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and not( 
PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 
'PreviouslyChargedOff'))").explain(True)
{code}

{noformat}
== Parsed Logical Plan ==
'Filter (('LoanID = 62231) && NOT (('PaymentsReceived = 0) && 'ExplicitRoll IN 
(PreviouslyPaidOff,PreviouslyChargedOff)))
 
Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583]
 ParquetRelation[file:/d:/MktLending/prp_enh1]

== Analyzed Logical Plan ==
BorrowerRate: double, MnthRate: double, ObservationMonth: date, CycleCounter: 
int, LoanID: int, Loankey: string, OriginationDate: date, OriginationQuarter: 
string, LoanAmount: double, Term: int, LenderRate: double, ProsperRating: 
string, ScheduledMonthlyPaymentAmount: double, ChargeoffMonth: date, 
ChargeoffAmount: double, CompletedMonth: date, MonthOfLastPayment: date, 
PaymentsReceived: double, CollectionFees: double, PrincipalPaid: double, 
InterestPaid: double, LateFees: double, ServicingFees: double, 
RecoveryPayments: double, RecoveryPrin: double, DaysPastDue: int, 
PriorMonthDPD: int, ExplicitRoll: string, SummaryRoll: string, CumulPrin: 
double, EOMPrin: double, ScheduledPrinRemaining: double, ScheduledCumulPrin: 
double, ScheduledPeriodicPrin: double, BOMPrin: double, ListingNumber: int, 
DebtSaleMonth: int, GrossCashFromDebtSale: double, DebtSaleFee: double, 
NetCashToInvestorsFromDebtSale: double, OZVintage: string
Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = cast(0 as 
double)) && ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
 
Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583]
 ParquetRelation[file:/d:/MktLending/prp_enh1]

== Optimized Logical Plan ==
Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = 0.0) && 
ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
 

[jira] [Comment Edited] (SPARK-12218) Boolean logic in sql does not work "not (A and B)" is not the same as "(not A) or (not B)"

2015-12-09 Thread Irakli Machabeli (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048619#comment-15048619
 ] 

Irakli Machabeli edited comment on SPARK-12218 at 12/9/15 2:25 PM:
---

Below is the explain plan.
To make it clear, query that contains not (A and B) :
{code}
"and not( PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 
'PreviouslyChargedOff'))")"  
{code}  
produces wrong results, 
and query that is already expanded as (not A) or (not B) produces correct 
output.
By the way I saw in explain plan cast(0 as double)) so I tried to change 0 => 
0.0 but no difference.


Physical plan looks similar:

{code}
'Filter (('LoanID = 62231) && (NOT ('PaymentsReceived = 0) || NOT 'ExplicitRoll 
IN (PreviouslyPaidOff,PreviouslyChargedOff)))
Filter ((LoanID#8588 = 62231) && (NOT (PaymentsReceived#8601 = 0.0) || NOT 
ExplicitRoll#8611 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
{code}

Explain plan results: 

{code}
In [13]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and not( 
PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 
'PreviouslyChargedOff'))").explain(True)
{code}

{noformat}
== Parsed Logical Plan ==
'Filter (('LoanID = 62231) && NOT (('PaymentsReceived = 0) && 'ExplicitRoll IN 
(PreviouslyPaidOff,PreviouslyChargedOff)))
 
Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583]
 ParquetRelation[file:/d:/MktLending/prp_enh1]

== Analyzed Logical Plan ==
BorrowerRate: double, MnthRate: double, ObservationMonth: date, CycleCounter: 
int, LoanID: int, Loankey: string, OriginationDate: date, OriginationQuarter: 
string, LoanAmount: double, Term: int, LenderRate: double, ProsperRating: 
string, ScheduledMonthlyPaymentAmount: double, ChargeoffMonth: date, 
ChargeoffAmount: double, CompletedMonth: date, MonthOfLastPayment: date, 
PaymentsReceived: double, CollectionFees: double, PrincipalPaid: double, 
InterestPaid: double, LateFees: double, ServicingFees: double, 
RecoveryPayments: double, RecoveryPrin: double, DaysPastDue: int, 
PriorMonthDPD: int, ExplicitRoll: string, SummaryRoll: string, CumulPrin: 
double, EOMPrin: double, ScheduledPrinRemaining: double, ScheduledCumulPrin: 
double, ScheduledPeriodicPrin: double, BOMPrin: double, ListingNumber: int, 
DebtSaleMonth: int, GrossCashFromDebtSale: double, DebtSaleFee: double, 
NetCashToInvestorsFromDebtSale: double, OZVintage: string
Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = cast(0 as 
double)) && ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
 
Relation[BorrowerRate#8543,MnthRate#8544,ObservationMonth#8545,CycleCounter#8546,LoanID#8547,Loankey#8548,OriginationDate#8549,OriginationQuarter#8550,LoanAmount#8551,Term#8552,LenderRate#8553,ProsperRating#8554,ScheduledMonthlyPaymentAmount#8555,ChargeoffMonth#8556,ChargeoffAmount#8557,CompletedMonth#8558,MonthOfLastPayment#8559,PaymentsReceived#8560,CollectionFees#8561,PrincipalPaid#8562,InterestPaid#8563,LateFees#8564,ServicingFees#8565,RecoveryPayments#8566,RecoveryPrin#8567,DaysPastDue#8568,PriorMonthDPD#8569,ExplicitRoll#8570,SummaryRoll#8571,CumulPrin#8572,EOMPrin#8573,ScheduledPrinRemaining#8574,ScheduledCumulPrin#8575,ScheduledPeriodicPrin#8576,BOMPrin#8577,ListingNumber#8578,DebtSaleMonth#8579,GrossCashFromDebtSale#8580,DebtSaleFee#8581,NetCashToInvestorsFromDebtSale#8582,OZVintage#8583]
 ParquetRelation[file:/d:/MktLending/prp_enh1]

== Optimized Logical Plan ==
Filter ((LoanID#8547 = 62231) && NOT ((PaymentsReceived#8560 = 0.0) && 
ExplicitRoll#8570 IN (PreviouslyPaidOff,PreviouslyChargedOff)))
 

[jira] [Commented] (SPARK-12218) Boolean logic in sql does not work "not (A and B)" is not the same as "(not A) or (not B)"

2015-12-09 Thread Irakli Machabeli (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049194#comment-15049194
 ] 

Irakli Machabeli commented on SPARK-12218:
--


{code}
scala> val df = sqlContext.read.parquet(pathOne).where("c < 6 and not (a = 2 
and b in ('1', '2'))")
df: org.apache.spark.sql.DataFrame = [a: int, b: string, c: int]

scala> df.explain(true)
{code}

{noformat}
== Parsed Logical Plan ==
'Filter (('c < 6) && NOT (('a = 2) && 'b IN (1,2)))
 Relation[a#30,b#31,c#32] ParquetRelation[file:/D:/tmp/test]

== Analyzed Logical Plan ==
a: int, b: string, c: int
Filter ((c#32 < 6) && NOT ((a#30 = 2) && b#31 IN (1,2)))
 Relation[a#30,b#31,c#32] ParquetRelation[file:/D:/tmp/test]

== Optimized Logical Plan ==
Filter ((c#32 < 6) && NOT ((a#30 = 2) && b#31 IN (1,2)))
 Relation[a#30,b#31,c#32] ParquetRelation[file:/D:/tmp/test]

== Physical Plan ==
Filter ((c#32 < 6) && NOT ((a#30 = 2) && b#31 IN (1,2)))
 Scan ParquetRelation[file:/D:/tmp/test][a#30,b#31,c#32]

Code Generation: true

{noformat}



{code}
scala> val df2 = sqlContext.read.parquet(pathOne).where("c < 6 and (not(a = 2) 
or not(b in ('1', '2')))")
df2: org.apache.spark.sql.DataFrame = [a: int, b: string, c: int]

scala> df2.explain(true)
{code}

{noformat}
== Parsed Logical Plan ==
'Filter (('c < 6) && (NOT ('a = 2) || NOT 'b IN (1,2)))
 Relation[a#34,b#35,c#36] ParquetRelation[file:/D:/tmp/test]

== Analyzed Logical Plan ==
a: int, b: string, c: int
Filter ((c#36 < 6) && (NOT (a#34 = 2) || NOT b#35 IN (1,2)))
 Relation[a#34,b#35,c#36] ParquetRelation[file:/D:/tmp/test]

== Optimized Logical Plan ==
Filter ((c#36 < 6) && (NOT (a#34 = 2) || NOT b#35 IN (1,2)))
 Relation[a#34,b#35,c#36] ParquetRelation[file:/D:/tmp/test]

== Physical Plan ==
Filter ((c#36 < 6) && (NOT (a#34 = 2) || NOT b#35 IN (1,2)))
 Scan ParquetRelation[file:/D:/tmp/test][a#34,b#35,c#36]

Code Generation: true
{noformat}

> Boolean logic in sql does not work  "not (A and B)" is not the same as  "(not 
> A) or (not B)"
> 
>
> Key: SPARK-12218
> URL: https://issues.apache.org/jira/browse/SPARK-12218
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Irakli Machabeli
>Priority: Blocker
>
> Two identical queries produce different results
> In [2]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and not( 
> PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 
> 'PreviouslyChargedOff'))").count()
> Out[2]: 18
> In [3]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and ( 
> not(PaymentsReceived=0) or not (ExplicitRoll in ('PreviouslyPaidOff', 
> 'PreviouslyChargedOff')))").count()
> Out[3]: 28



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12218) Boolean logic in sql does not work "not (A and B)" is not the same as "(not A) or (not B)"

2015-12-08 Thread Irakli Machabeli (JIRA)
Irakli Machabeli created SPARK-12218:


 Summary: Boolean logic in sql does not work  "not (A and B)" is 
not the same as  "(not A) or (not B)"
 Key: SPARK-12218
 URL: https://issues.apache.org/jira/browse/SPARK-12218
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.2
Reporter: Irakli Machabeli
Priority: Blocker


Two identical queries produce different results


In [2]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and not( 
PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff', 
'PreviouslyChargedOff'))").count()

Out[2]: 18

In [3]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and ( 
not(PaymentsReceived=0) or not (ExplicitRoll in ('PreviouslyPaidOff', 
'PreviouslyChargedOff')))").count()
Out[3]: 28




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12095) Window function rowsBetween throws exception

2015-12-02 Thread Irakli Machabeli (JIRA)
Irakli Machabeli created SPARK-12095:


 Summary: Window function rowsBetween throws exception
 Key: SPARK-12095
 URL: https://issues.apache.org/jira/browse/SPARK-12095
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.1
Reporter: Irakli Machabeli


>From pyspark :
 windowSpec=Window.partitionBy('A', 'B').orderBy('A','B', 
'C').rowsBetween('UNBOUNDED PRECEDING','CURRENT')

Py4JError: An error occurred while calling o1107.rowsBetween. Trace:
py4j.Py4JException: Method rowsBetween([class java.lang.String, class 
java.lang.Long]) does not exist


from SQL query parser fails immediately:

Py4JJavaError: An error occurred while calling o18.sql.
: java.lang.RuntimeException: [1.20] failure: ``union'' expected but `(' found

select rank() OVER (PARTITION BY c1 ORDER BY c2 ) as rank from tbl
   ^
at scala.sys.package$.error(package.scala:27)
at 
org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:36)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org