[jira] [Assigned] (SPARK-27917) Semantic equals of CaseWhen is failing with case sensitivity of column Names

2019-06-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27917:


Assignee: Apache Spark

> Semantic equals of CaseWhen is failing with case sensitivity of column Names
> 
>
> Key: SPARK-27917
> URL: https://issues.apache.org/jira/browse/SPARK-27917
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
> Environment: Spark-2.3.2
>Reporter: Akash R Nilugal
>Assignee: Apache Spark
>Priority: Major
>
> Semantic equals of CaseWhen is failing with case sensitivity of column Names



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27917) Semantic equals of CaseWhen is failing with case sensitivity of column Names

2019-06-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27917:


Assignee: (was: Apache Spark)

> Semantic equals of CaseWhen is failing with case sensitivity of column Names
> 
>
> Key: SPARK-27917
> URL: https://issues.apache.org/jira/browse/SPARK-27917
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
> Environment: Spark-2.3.2
>Reporter: Akash R Nilugal
>Priority: Major
>
> Semantic equals of CaseWhen is failing with case sensitivity of column Names



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27917) Semantic equals of CaseWhen is failing with case sensitivity of column Names

2019-06-01 Thread Sandeep Katta (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853869#comment-16853869
 ] 

Sandeep Katta commented on SPARK-27917:
---

This can be tested using the below UT code

val attrRef = AttributeReference("ACCESS_CHECK", StringType)()
val aliasAttrRef = attrRef.withName("access_check")
val caseWhenObj1 = CaseWhen(Seq((attrRef, Literal("A"
val caseWhenObj2 = CaseWhen(Seq((aliasAttrRef, Literal("A"
assert(caseWhenObj1.semanticEquals(caseWhenObj2))

> Semantic equals of CaseWhen is failing with case sensitivity of column Names
> 
>
> Key: SPARK-27917
> URL: https://issues.apache.org/jira/browse/SPARK-27917
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
> Environment: Spark-2.3.2
>Reporter: Akash R Nilugal
>Priority: Major
>
> Semantic equals of CaseWhen is failing with case sensitivity of column Names



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27917) Semantic equals of CaseWhen is failing with case sensitivity of column Names

2019-06-01 Thread Sandeep Katta (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta updated SPARK-27917:
--
Description: Semantic equals of CaseWhen is failing with case sensitivity 
of column Names  (was: Semantic equals of CsseWhen is failing with case 
sensitivity of column Names)

> Semantic equals of CaseWhen is failing with case sensitivity of column Names
> 
>
> Key: SPARK-27917
> URL: https://issues.apache.org/jira/browse/SPARK-27917
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
> Environment: Spark-2.3.2
>Reporter: Akash R Nilugal
>Priority: Major
>
> Semantic equals of CaseWhen is failing with case sensitivity of column Names



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27917) Semantic equals of CaseWhen is failing with case sensitivity of column Names

2019-06-01 Thread Sandeep Katta (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta updated SPARK-27917:
--
Summary: Semantic equals of CaseWhen is failing with case sensitivity of 
column Names  (was: Semantic equals of CsseWhen is failing with case 
sensitivity of column Names)

> Semantic equals of CaseWhen is failing with case sensitivity of column Names
> 
>
> Key: SPARK-27917
> URL: https://issues.apache.org/jira/browse/SPARK-27917
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
> Environment: Spark-2.3.2
>Reporter: Akash R Nilugal
>Priority: Major
>
> Semantic equals of CsseWhen is failing with case sensitivity of column Names



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27917) Semantic equals of CsseWhen is failing with case sensitivity of column Names

2019-06-01 Thread Akash R Nilugal (JIRA)
Akash R Nilugal created SPARK-27917:
---

 Summary: Semantic equals of CsseWhen is failing with case 
sensitivity of column Names
 Key: SPARK-27917
 URL: https://issues.apache.org/jira/browse/SPARK-27917
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.2
 Environment: Spark-2.3.2
Reporter: Akash R Nilugal


Semantic equals of CsseWhen is failing with case sensitivity of column Names



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27916) SparkThreatServer memory leak

2019-06-01 Thread angerszhu (JIRA)
angerszhu created SPARK-27916:
-

 Summary: SparkThreatServer memory leak
 Key: SPARK-27916
 URL: https://issues.apache.org/jira/browse/SPARK-27916
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0, 2.3.0
Reporter: angerszhu


When we use SparkThriftServer, when set config 

spark.sql.hive.thriftServer.singleSession =  true

Each client session will create a SparkSession Object, but this 
InheritThreadLocal Object will not be released since when we call 
SparkSession.sql(), It just call ThreadLocal's set method. This make 
SparkSession Object remained in JVM, not cleared

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27792) SkewJoin--handle only skewed keys with broadcastjoin and other keys with normal join

2019-06-01 Thread Jason Guo (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Guo updated SPARK-27792:
--
Shepherd:   (was: Dongjoon Hyun)

> SkewJoin--handle only skewed keys with broadcastjoin and other keys with 
> normal join
> 
>
> Key: SPARK-27792
> URL: https://issues.apache.org/jira/browse/SPARK-27792
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Jason Guo
>Priority: Major
> Attachments: SMJ DAG.png, SMJ tasks.png, skew join DAG.png, sql.png, 
> time.png
>
>
> This feature is designed to handle data skew in Join
>  
> *Senario*
>  * A big table (big_skewed) which contains a a few skewed key
>  * A small table (small_even) which has no skewed key and is larger than the 
> broadcast threshold 
>  * When big_skewed.join(small_even), a few tasks will be much slower than 
> other tasks because they need to handle the skewed key
> *Solution*
>  * Provide a hint to indicate which keys are skewed keys
>  * Handle the skewed keys with broadcastjoin and join the non-skewed keys 
> with normal joint method
>  * For the small table, the whole table is larger than the broadcast 
> threshold. But total size of the records with the same keys which is skewed 
> keys in the big table is smaller than the broadcast threshold, so these 
> records can be joint with the big table with broadcast join
>  * For other records with non-skewed keys, they can be joint with normal join 
> method
>  * We can get the final result with union the above two parts
> *Effect*
> This feature reduce the join time from 5.7 minutes to 2.1 minutes
> !time.png!
> !sql.png!  
>  
> *Experiment*
> *Without this feature, the whole job took 5.7 minutes*
> tableA has 2 skewed keys 9500048 and 9500096
> {code:java}
> INSERT OVERWRITE TABLE big_skewed
> SELECT CAST(CASE WHEN id < 90800 THEN (950 + (CAST (RAND() * 2 AS 
> INT) + 1) * 48 )
>  ELSE CAST(id/100 AS INT) END AS STRING), 'A'
>  name
> FROM ids
> WHERE id BETWEEN 9 AND 105000;{code}
> tableB has no skewed keys
> {code:java}
> INSERT OVERWRITE TABLE small_even
> SELECT CAST(CAST(id/100 AS INT) AS STRING), 'B'
>  name
> FROM ids
> WHERE id BETWEEN 95000 AND 95050;{code}
>  
> Join them with setting spark.sql.autoBroadcastJoinThreshold to 3000
> {code:java}
> insert overwrite table result_with_skew
> select big_skewed.id, tabig_skewed.value, small_even.value
> from big_skewed
> join small_even
> on small_even.id=big_skewed.id;
> {code}
>  
> The sort merge join is slow with 2 straggle tasks
> !SMJ DAG.png!  
>   !SMJ tasks.png!
>  
> *With this feature, the job took only 2.1 minutes*
> The skewed keys are joint with broadcast join and the non-skewed keys are 
> joint with sort merge join
> !skew join DAG.png!  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27915) Update logical Filter's output nullability based on IsNotNull conditions

2019-06-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27915:


Assignee: Apache Spark  (was: Josh Rosen)

> Update logical Filter's output nullability based on IsNotNull conditions
> 
>
> Key: SPARK-27915
> URL: https://issues.apache.org/jira/browse/SPARK-27915
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Josh Rosen
>Assignee: Apache Spark
>Priority: Major
>
> The physical `FilterExec` operator has logic to update its output nullability 
> based on IsNotNull expressions; it might make sense to add similar logic to 
> the logical Filter operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27915) Update logical Filter's output nullability based on IsNotNull conditions

2019-06-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27915:


Assignee: Josh Rosen  (was: Apache Spark)

> Update logical Filter's output nullability based on IsNotNull conditions
> 
>
> Key: SPARK-27915
> URL: https://issues.apache.org/jira/browse/SPARK-27915
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
>
> The physical `FilterExec` operator has logic to update its output nullability 
> based on IsNotNull expressions; it might make sense to add similar logic to 
> the logical Filter operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27915) Update logical Filter's output nullability based on IsNotNull conditions

2019-06-01 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-27915:
--

 Summary: Update logical Filter's output nullability based on 
IsNotNull conditions
 Key: SPARK-27915
 URL: https://issues.apache.org/jira/browse/SPARK-27915
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Josh Rosen
Assignee: Josh Rosen


The physical `FilterExec` operator has logic to update its output nullability 
based on IsNotNull expressions; it might make sense to add similar logic to the 
logical Filter operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24815) Structured Streaming should support dynamic allocation

2019-06-01 Thread Karthik Palaniappan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853745#comment-16853745
 ] 

Karthik Palaniappan commented on SPARK-24815:
-

Just to clarify: does Spark allow having multiple tasks per Kafka partition? 
This doc implies that they are 1:1: 
[https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html]. 
Batch dynamic allocation gives you enough executors to process all tasks in 
running stage in parallel. I assume you end up with too much data per 
partition, your only recourse would be to increase the number of kafka 
partitions.

Also, does Spark not handle state rebalancing today? In other words, is SS 
already fault tolerant to node failures?

Do you have any good references (JIRAs, design docs) on how Spark stores 
streaming state internally? I see plenty of blogs and articles on how to do 
stateful processing, but not on how it works under the hood.

> Structured Streaming should support dynamic allocation
> --
>
> Key: SPARK-24815
> URL: https://issues.apache.org/jira/browse/SPARK-24815
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Structured Streaming
>Affects Versions: 2.3.1
>Reporter: Karthik Palaniappan
>Priority: Minor
>
> For batch jobs, dynamic allocation is very useful for adding and removing 
> containers to match the actual workload. On multi-tenant clusters, it ensures 
> that a Spark job is taking no more resources than necessary. In cloud 
> environments, it enables autoscaling.
> However, if you set spark.dynamicAllocation.enabled=true and run a structured 
> streaming job, the batch dynamic allocation algorithm kicks in. It requests 
> more executors if the task backlog is a certain size, and removes executors 
> if they idle for a certain period of time.
> Quick thoughts:
> 1) Dynamic allocation should be pluggable, rather than hardcoded to a 
> particular implementation in SparkContext.scala (this should be a separate 
> JIRA).
> 2) We should make a structured streaming algorithm that's separate from the 
> batch algorithm. Eventually, continuous processing might need its own 
> algorithm.
> 3) Spark should print a warning if you run a structured streaming job when 
> Core's dynamic allocation is enabled



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21117) Built-in SQL Function Support - WIDTH_BUCKET

2019-06-01 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853736#comment-16853736
 ] 

Yuming Wang commented on SPARK-21117:
-

PostgreSQL support this function:
||Function||Return Type||Description||Example||Result||
|width_bucket(operanddp, b1 dp, b2 dp, countint)|int|return the bucket number 
to which operand would be assigned in a histogram having count equal-width 
buckets spanning the range b1 to b2; returns 0 or count+1 for an input outside 
the range|width_bucket(5.35, 0.024, 10.06, 5)|3|
|width_bucket(operandnumeric, b1 numeric, b2numeric, count int)|int|return the 
bucket number to which operand would be assigned in a histogram having count 
equal-width buckets spanning the range b1 to b2; returns 0 or count+1 for an 
input outside the range|width_bucket(5.35, 0.024, 10.06, 5)|3|
|width_bucket(operandanyelement, thresholdsanyarray)|int|return the bucket 
number to which operand would be assigned given an array listing the lower 
bounds of the buckets; returns 0 for an input less than the first lower bound; 
the thresholds array must be sorted, smallest first, or unexpected results will 
be obtained|width_bucket(now(), array['yesterday', 'today', 
'tomorrow']::timestamptz[])|2|

[https://www.postgresql.org/docs/11/functions-math.html]

> Built-in SQL Function Support - WIDTH_BUCKET
> 
>
> Key: SPARK-21117
> URL: https://issues.apache.org/jira/browse/SPARK-21117
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: bulk-closed
>
> For a given expression, the {{WIDTH_BUCKET}} function returns the bucket 
> number into which the value of this expression would fall after being 
> evaluated.
> {code:sql}
> WIDTH_BUCKET (expr , min_value , max_value , num_buckets)
> {code}
> Ref: 
> https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2137.htm#OLADM717



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21117) Built-in SQL Function Support - WIDTH_BUCKET

2019-06-01 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-21117:

Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-27764

> Built-in SQL Function Support - WIDTH_BUCKET
> 
>
> Key: SPARK-21117
> URL: https://issues.apache.org/jira/browse/SPARK-21117
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: bulk-closed
>
> For a given expression, the {{WIDTH_BUCKET}} function returns the bucket 
> number into which the value of this expression would fall after being 
> evaluated.
> {code:sql}
> WIDTH_BUCKET (expr , min_value , max_value , num_buckets)
> {code}
> Ref: 
> https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2137.htm#OLADM717



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21117) Built-in SQL Function Support - WIDTH_BUCKET

2019-06-01 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-21117:

Issue Type: Improvement  (was: Sub-task)
Parent: (was: SPARK-20746)

> Built-in SQL Function Support - WIDTH_BUCKET
> 
>
> Key: SPARK-21117
> URL: https://issues.apache.org/jira/browse/SPARK-21117
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: bulk-closed
>
> For a given expression, the {{WIDTH_BUCKET}} function returns the bucket 
> number into which the value of this expression would fall after being 
> evaluated.
> {code:sql}
> WIDTH_BUCKET (expr , min_value , max_value , num_buckets)
> {code}
> Ref: 
> https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2137.htm#OLADM717



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23906) Add UDF trunc(numeric)

2019-06-01 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853733#comment-16853733
 ] 

Yuming Wang commented on SPARK-23906:
-

PostgreSQL has two functions: {{trunc}} and {{date_trunc}}. Maybe we need to 
merge {{TruncDate}} and {{TruncTimestamp}} into date_trunc and make {{trunc}} 
to support truncate numbers.

 

[https://www.postgresql.org/docs/11/functions-math.html]
 
[https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC]

 

 

> Add UDF trunc(numeric)
> --
>
> Key: SPARK-23906
> URL: https://issues.apache.org/jira/browse/SPARK-23906
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Yuming Wang
>Priority: Major
>
> https://issues.apache.org/jira/browse/HIVE-14582
> We already have {{date_trunc}} and {{trunc}}. Need to discuss whether we 
> should introduce a new name or reuse {{trunc}} for truncating numbers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-20856) support statement using nested joins

2019-06-01 Thread N Campbell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

N Campbell reopened SPARK-20856:


Per prior comment. This is an enhancement request where SPARKSQL was being 
asked to provide better parity to the joined table syntax that many systems 
support.

> support statement using nested joins
> 
>
> Key: SPARK-20856
> URL: https://issues.apache.org/jira/browse/SPARK-20856
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: N Campbell
>Priority: Major
>  Labels: bulk-closed
>
> While DB2, ORACLE etc support a join expressed as follows, SPARK SQL does 
> not. 
> Not supported
> select * from 
>   cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint
>  on tbint.rnum = tint.rnum
>  on tint.rnum = tsint.rnum
> versus written as shown
> select * from 
>   cert.tsint tsint inner join cert.tint tint on tsint.rnum = tint.rnum inner 
> join cert.tbint tbint on tint.rnum = tbint.rnum
>
> ERROR_STATE, SQL state: org.apache.spark.sql.catalyst.parser.ParseException: 
> extraneous input 'on' expecting {, ',', '.', '[', 'WHERE', 'GROUP', 
> 'ORDER', 'HAVING', 'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, 
> 'IS', 'JOIN', 'CROSS', 'INNER', 'LEFT', 'RIGHT', 'FULL', 'NATURAL', 
> 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', EQ, '<=>', 
> '<>', '!=', '<', LTE, '>', GTE, '+', '-', '*', '/', '%', 'DIV', '&', '|', 
> '^', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'ANTI'}(line 4, pos 5)
> == SQL ==
> select * from 
>   cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint
>  on tbint.rnum = tint.rnum
>  on tint.rnum = tsint.rnum
> -^^^
> , Query: select * from 
>   cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint
>  on tbint.rnum = tint.rnum
>  on tint.rnum = tsint.rnum.
> SQLState:  HY000
> ErrorCode: 500051



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27847) One-Pass MultilabelMetrics & MulticlassMetrics

2019-06-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-27847.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24717
[https://github.com/apache/spark/pull/24717]

> One-Pass MultilabelMetrics & MulticlassMetrics
> --
>
> Key: SPARK-27847
> URL: https://issues.apache.org/jira/browse/SPARK-27847
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
> Fix For: 3.0.0
>
>
> In MultilabelMetrics, one sperate pass is needed to compute:
> {color:#93a6f5}numDocs/numLabels/{color}{color:#93a6f5}subsetAccuracy/accuracy/hammingLoss/precision/recall/f1Measure/{color}{color:#93a6f5}tpPerClass/fpPerClass/fpPerClass/fnPerClass{color}
>  
> And In MulticlassMetrics, one sperate pass is needed to compute:
> {color:#93a6f5}labelCountByClass{color}{color:#93a6f5}/tpByClass/{color}{color:#93a6f5}fpByClass/confusions{color}
>  
> however, all above intermeidate metrics can be computed at once.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27847) One-Pass MultilabelMetrics & MulticlassMetrics

2019-06-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-27847:
-

Assignee: zhengruifeng

> One-Pass MultilabelMetrics & MulticlassMetrics
> --
>
> Key: SPARK-27847
> URL: https://issues.apache.org/jira/browse/SPARK-27847
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
>
> In MultilabelMetrics, one sperate pass is needed to compute:
> {color:#93a6f5}numDocs/numLabels/{color}{color:#93a6f5}subsetAccuracy/accuracy/hammingLoss/precision/recall/f1Measure/{color}{color:#93a6f5}tpPerClass/fpPerClass/fpPerClass/fnPerClass{color}
>  
> And In MulticlassMetrics, one sperate pass is needed to compute:
> {color:#93a6f5}labelCountByClass{color}{color:#93a6f5}/tpByClass/{color}{color:#93a6f5}fpByClass/confusions{color}
>  
> however, all above intermeidate metrics can be computed at once.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27847) One-Pass MultilabelMetrics & MulticlassMetrics

2019-06-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-27847:
--
Priority: Minor  (was: Major)

> One-Pass MultilabelMetrics & MulticlassMetrics
> --
>
> Key: SPARK-27847
> URL: https://issues.apache.org/jira/browse/SPARK-27847
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Priority: Minor
>
> In MultilabelMetrics, one sperate pass is needed to compute:
> {color:#93a6f5}numDocs/numLabels/{color}{color:#93a6f5}subsetAccuracy/accuracy/hammingLoss/precision/recall/f1Measure/{color}{color:#93a6f5}tpPerClass/fpPerClass/fpPerClass/fnPerClass{color}
>  
> And In MulticlassMetrics, one sperate pass is needed to compute:
> {color:#93a6f5}labelCountByClass{color}{color:#93a6f5}/tpByClass/{color}{color:#93a6f5}fpByClass/confusions{color}
>  
> however, all above intermeidate metrics can be computed at once.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27811) Docs of spark.driver.memoryOverhead and spark.executor.memoryOverhead exists a little ambiguity

2019-06-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-27811:
-

Assignee: jiaan.geng

>  Docs of spark.driver.memoryOverhead and spark.executor.memoryOverhead exists 
> a little ambiguity
> 
>
> Key: SPARK-27811
> URL: https://issues.apache.org/jira/browse/SPARK-27811
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Spark Core
>Affects Versions: 2.3.0, 2.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Trivial
>
> I found the docs of {{spark.driver.memoryOverhead}} and 
> {{spark.executor.memoryOverhead}} exists a little ambiguity.
> For example, the origin docs of {{spark.driver.memoryOverhead}} start with 
> {{The amount of off-heap memory to be allocated per driver in cluster mode}}.
> But {{MemoryManager}} also managed a memory area named off-heap used to 
> allocate memory in tungsten mode.
> So I think the description of {{spark.driver.memoryOverhead}} always make 
> confused.
> {{spark.executor.memoryOverhead}} has the same confused with 
> {{spark.driver.memoryOverhead}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27811) Docs of spark.driver.memoryOverhead and spark.executor.memoryOverhead exists a little ambiguity

2019-06-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-27811.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24671
[https://github.com/apache/spark/pull/24671]

>  Docs of spark.driver.memoryOverhead and spark.executor.memoryOverhead exists 
> a little ambiguity
> 
>
> Key: SPARK-27811
> URL: https://issues.apache.org/jira/browse/SPARK-27811
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Spark Core
>Affects Versions: 2.3.0, 2.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Trivial
> Fix For: 3.0.0
>
>
> I found the docs of {{spark.driver.memoryOverhead}} and 
> {{spark.executor.memoryOverhead}} exists a little ambiguity.
> For example, the origin docs of {{spark.driver.memoryOverhead}} start with 
> {{The amount of off-heap memory to be allocated per driver in cluster mode}}.
> But {{MemoryManager}} also managed a memory area named off-heap used to 
> allocate memory in tungsten mode.
> So I think the description of {{spark.driver.memoryOverhead}} always make 
> confused.
> {{spark.executor.memoryOverhead}} has the same confused with 
> {{spark.driver.memoryOverhead}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27811) Docs of spark.driver.memoryOverhead and spark.executor.memoryOverhead exists a little ambiguity

2019-06-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-27811:
--
Priority: Trivial  (was: Major)

>  Docs of spark.driver.memoryOverhead and spark.executor.memoryOverhead exists 
> a little ambiguity
> 
>
> Key: SPARK-27811
> URL: https://issues.apache.org/jira/browse/SPARK-27811
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Spark Core
>Affects Versions: 2.3.0, 2.4.0
>Reporter: jiaan.geng
>Priority: Trivial
>
> I found the docs of {{spark.driver.memoryOverhead}} and 
> {{spark.executor.memoryOverhead}} exists a little ambiguity.
> For example, the origin docs of {{spark.driver.memoryOverhead}} start with 
> {{The amount of off-heap memory to be allocated per driver in cluster mode}}.
> But {{MemoryManager}} also managed a memory area named off-heap used to 
> allocate memory in tungsten mode.
> So I think the description of {{spark.driver.memoryOverhead}} always make 
> confused.
> {{spark.executor.memoryOverhead}} has the same confused with 
> {{spark.driver.memoryOverhead}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18570) Consider supporting other R formula operators

2019-06-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18570:


Assignee: (was: Apache Spark)

> Consider supporting other R formula operators
> -
>
> Key: SPARK-18570
> URL: https://issues.apache.org/jira/browse/SPARK-18570
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, SparkR
>Reporter: Felix Cheung
>Priority: Minor
>
> Such as
> {code}
> ∗ 
>  X∗Y include these variables and the interactions between them
> ^
>  (X + Z + W)^3 include these variables and all interactions up to three way
> |
>  X | Z conditioning: include x given z
> {code}
> Other includes, %in%, ` (backtick)
> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/formula.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18570) Consider supporting other R formula operators

2019-06-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18570:


Assignee: Apache Spark

> Consider supporting other R formula operators
> -
>
> Key: SPARK-18570
> URL: https://issues.apache.org/jira/browse/SPARK-18570
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, SparkR
>Reporter: Felix Cheung
>Assignee: Apache Spark
>Priority: Minor
>
> Such as
> {code}
> ∗ 
>  X∗Y include these variables and the interactions between them
> ^
>  (X + Z + W)^3 include these variables and all interactions up to three way
> |
>  X | Z conditioning: include x given z
> {code}
> Other includes, %in%, ` (backtick)
> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/formula.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org