from:"Mathew Wicks \(Jira\)"

[jira] [Commented] (SPARK-28360) The serviceAccountName configuration item does not take effect in client mode.

2021-07-07 Thread Mathew Wicks (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376901#comment-17376901
 ] 

Mathew Wicks commented on SPARK-28360:
--

[~holden] this issue is still present. That is, the docs still imply that 
`spark.kubernetes.authenticate.serviceAccountName` will set the 
serviceAccountName in client mode.

You can [see the faulty docs 
|https://spark.apache.org/docs/3.1.2/running-on-kubernetes.html] under the 
"Meaning" column of the 
`spark.kubernetes.authenticate.driver.serviceAccountName` config of the Spark 
3.1.2 docs.
{quote}Service account that is used when running the driver pod. The driver pod 
uses this service account when requesting executor pods from the API server. 
Note that this cannot be specified alongside a CA cert file, client key file, 
client cert file, and/or OAuth token. In client mode, use 
spark.kubernetes.authenticate.serviceAccountName instead.
{quote}

> The serviceAccountName configuration item does not take effect in client mode.
> --
>
> Key: SPARK-28360
> URL: https://issues.apache.org/jira/browse/SPARK-28360
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.0
>Reporter: zhixingheyi_tian
>Priority: Major
>
> From the configuration item description from the spark document: 
> https://spark.apache.org/docs/latest/running-on-kubernetes.html
>  
> “spark.kubernetes.authenticate.driver.serviceAccountName default Service 
> account that is used when running the driver pod. The driver pod uses this 
> service account when requesting executor pods from the API server. Note that 
> this cannot be specified alongside a CA cert file, client key file, client 
> cert file, and/or OAuth token. In client mode, use 
> spark.kubernetes.authenticate.serviceAccountName instead.”
> But in client mode. “spark.kubernetes.authenticate.serviceAccountName” does 
> not take effect in fact.
> From the analysis of source codes, spark does not get this configuration item 
> "spark.kubernetes.authenticate.serviceAccountName".
>  In Unit Tests, only cases for 
> "spark.kubernetes.authenticate.driver.serviceAccountName".
> In kubernetes， a service account provides an identity for processes that run 
> in a Pod. When you create a pod, if you do not specify a service account, it 
> is automatically assigned the default service account in the same namespace. 
>  Add a “spec.serviceAccountName” when creating a pod , can specify a custom 
> service account.
>  So in client mode, If you run your driver inside a Kubernetes pod, the 
> serviceaccount has already existed. If your application is not running inside 
> a pod, no serviceaccount is needed at all.
> From my point of view, just modify the document and delete the 
> "spark.kubernetes.authenticate.serviceAccountName" configuration item 
> description. Because it doesn't work at the moment, it also doesn't need to 
> work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32226) JDBC TimeStamp predicates always append `.0`

2020-07-12 Thread Mathew Wicks (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156413#comment-17156413
 ] 

Mathew Wicks commented on SPARK-32226:
--

[~Chen Zhang], thanks for the idea!

However, while `DATETIME YEAR TO SECOND` dosen't support `.0` suffix, `DATETIME 
YEAR TO FRACTION` does, so we would need to put some logic to detect what the 
type of the source column is.

 

BTW Is adding new dialects something which is accepted into core spark these 
days?

> JDBC TimeStamp predicates always append `.0`
> 
>
> Key: SPARK-32226
> URL: https://issues.apache.org/jira/browse/SPARK-32226
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Mathew Wicks
>Priority: Major
>
> If you have an Informix column with type `DATETIME YEAR TO SECOND`, Informix 
> will not let you pass a filter of the form `2020-01-01 00:00:00.0` (with the 
> `.0` at the end).
>  
> In Spark 3.0.0, our predicate pushdown will alway append this `.0` to the end 
> of a TimeStamp column filter, even if you don't specify it:
> {code:java}
> df.where("col1 > '2020-01-01 00:00:00'")
> {code}
>  
> I think we should only pass the `.XXX` suffix if the user passes it in the 
> filter, for example:
> {code:java}
> df.where("col1 > '2020-01-01 00:00:00.123'")
> {code}
>  
> The relevant Spark class is:
> {code:java}
> org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString
> {code}
>  
>  To aid people searching for this error, here is the error emitted by spark:
> {code:java}
> Driver stacktrace:
>   at 
> org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2023)
>   at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1972)
>   at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1971)
>   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1971)
>   at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:950)
>   at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:950)
>   at scala.Option.foreach(Option.scala:407)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:950)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2203)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2152)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2141)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
>   at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:752)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2093)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2114)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2133)
>   at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:467)
>   at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:420)
>   at 
> org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:47)
>   at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3625)
>   at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2695)
>   at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614)
>   at org.apache.spark.sql.Dataset.head(Dataset.scala:2695)
>   at org.apache.spark.sql.Dataset.take(Dataset.scala:2902)
>   at org.apache.spark.sql.Dataset.getRows(Dataset.scala:300)
>   at org.apache.spark.sql.Dataset.showString(Dataset.scala:337)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:824)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:783)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:792)
>   ... 47 elided
> Caused by: java.sql.SQLException: Extra characters at the end of a datetime 
> or interval.
>   at com.i

[jira] [Created] (SPARK-32226) JDBC TimeStamp predicates always append `.0`

2020-07-08 Thread Mathew Wicks (Jira)

Mathew Wicks created SPARK-32226:


 Summary: JDBC TimeStamp predicates always append `.0`
 Key: SPARK-32226
 URL: https://issues.apache.org/jira/browse/SPARK-32226
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Mathew Wicks


If you have an Informix column with type `DATETIME YEAR TO SECOND`, Informix 
will not let you pass a filter of the form `2020-01-01 00:00:00.0` (with the 
`.0` at the end).

 

In Spark 3.0.0, our predicate pushdown will alway append this `.0` to the end 
of a TimeStamp column filter, even if you don't specify it:
{code:java}
df.where("col1 > '2020-01-01 00:00:00'")
{code}
 

I think we should only pass the `.XXX` suffix if the user passes it in the 
filter, for example:
{code:java}
df.where("col1 > '2020-01-01 00:00:00.123'")
{code}
 

The relevant Spark class is:
{code:java}
org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString
{code}
 
 To aid people searching for this error, here is the error emitted by spark:
{code:java}
Driver stacktrace:
  at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2023)
  at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1972)
  at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1971)
  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
  at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1971)
  at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:950)
  at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:950)
  at scala.Option.foreach(Option.scala:407)
  at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:950)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2203)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2152)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2141)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:752)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2093)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2114)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2133)
  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:467)
  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:420)
  at 
org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:47)
  at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3625)
  at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2695)
  at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616)
  at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
  at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
  at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
  at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:2695)
  at org.apache.spark.sql.Dataset.take(Dataset.scala:2902)
  at org.apache.spark.sql.Dataset.getRows(Dataset.scala:300)
  at org.apache.spark.sql.Dataset.showString(Dataset.scala:337)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:824)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:783)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:792)
  ... 47 elided
Caused by: java.sql.SQLException: Extra characters at the end of a datetime or 
interval.
  at com.informix.util.IfxErrMsg.buildExceptionWithMessage(IfxErrMsg.java:416)
  at com.informix.util.IfxErrMsg.buildIsamException(IfxErrMsg.java:401)
  at com.informix.jdbc.IfxSqli.addException(IfxSqli.java:3096)
  at com.informix.jdbc.IfxSqli.receiveError(IfxSqli.java:3368)
  at com.informix.jdbc.IfxSqli.dispatchMsg(IfxSqli.java:2292)
  at com.informix.jdbc.IfxSqli.receiveMessage(IfxSqli.java:2217)
  at com.informix.jdbc.IfxSqli.executePrepare(IfxSqli.java:1213)
  at 
com.informix.jdbc.IfxPreparedStatement.setupExecutePrepare(IfxPreparedStatement.java:245)
  at 
com.informix.jdbc.IfxPreparedStatement.processSQL(IfxPreparedStatement.java:229)
  at 
com.informix.jdbc.IfxPreparedStatement.(IfxPreparedStatement.java:119)
  at

[jira] [Commented] (SPARK-26295) [K8S] serviceAccountName is not set in client mode

2020-01-27 Thread Mathew Wicks (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-26295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024838#comment-17024838
 ] 

Mathew Wicks commented on SPARK-26295:
--

I am still encountering this issue on 2.4.4, (and given SPARK-28360, this issue 
likely also occurs in Spark 3.0's current preview, but I haven't verified this).

Can anyone take a look at this [~dongjoon]?

The issue is effectively that 
`spark.kubernetes.authenticate.driver.serviceAccountName` and 
`spark.kubernetes.authenticate.serviceAccountName` are ignored in client mode 
with K8S master. No matter what you specify, the default service account for 
`spark.kubernetes.namespace` namespace is used

> [K8S] serviceAccountName is not set in client mode
> --
>
> Key: SPARK-26295
> URL: https://issues.apache.org/jira/browse/SPARK-26295
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Adrian Tanase
>Priority: Major
>
> When deploying spark apps in client mode (in my case from inside the driver 
> pod), one can't specify the service account in accordance to the docs 
> ([https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac).]
> The property {{spark.kubernetes.authenticate.driver.serviceAccountName}} is 
> most likely added in cluster mode only, which would be consistent with 
> {{spark.kubernetes.authenticate.driver}} being the cluster mode prefix.
> We should either inject the service account specified by this property in the 
> client mode pods, or specify an equivalent config: 
> {{spark.kubernetes.authenticate.serviceAccountName}}
>  This is the exception:
> {noformat}
> Message: Forbidden!Configured service account doesn't have access. Service 
> account may have been revoked. pods "..." is forbidden: User 
> "system:serviceaccount:mynamespace:default" cannot get pods in the namespace 
> "mynamespace"{noformat}
> The expectation was to see the user {{mynamespace:spark}} based on my submit 
> command.
> My current workaround is to create a clusterrolebinding with edit rights for 
> the mynamespace:default account.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28360) The serviceAccountName configuration item does not take effect in client mode.

2020-01-27 Thread Mathew Wicks (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024835#comment-17024835
 ] 

Mathew Wicks commented on SPARK-28360:
--

This issue was also reported for 2.4, but has not been fixed.

> The serviceAccountName configuration item does not take effect in client mode.
> --
>
> Key: SPARK-28360
> URL: https://issues.apache.org/jira/browse/SPARK-28360
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: zhixingheyi_tian
>Priority: Major
>
> From the configuration item description from the spark document: 
> https://spark.apache.org/docs/latest/running-on-kubernetes.html
>  
> “spark.kubernetes.authenticate.driver.serviceAccountName default Service 
> account that is used when running the driver pod. The driver pod uses this 
> service account when requesting executor pods from the API server. Note that 
> this cannot be specified alongside a CA cert file, client key file, client 
> cert file, and/or OAuth token. In client mode, use 
> spark.kubernetes.authenticate.serviceAccountName instead.”
> But in client mode. “spark.kubernetes.authenticate.serviceAccountName” does 
> not take effect in fact.
> From the analysis of source codes, spark does not get this configuration item 
> "spark.kubernetes.authenticate.serviceAccountName".
>  In Unit Tests, only cases for 
> "spark.kubernetes.authenticate.driver.serviceAccountName".
> In kubernetes， a service account provides an identity for processes that run 
> in a Pod. When you create a pod, if you do not specify a service account, it 
> is automatically assigned the default service account in the same namespace. 
>  Add a “spec.serviceAccountName” when creating a pod , can specify a custom 
> service account.
>  So in client mode, If you run your driver inside a Kubernetes pod, the 
> serviceaccount has already existed. If your application is not running inside 
> a pod, no serviceaccount is needed at all.
> From my point of view, just modify the document and delete the 
> "spark.kubernetes.authenticate.serviceAccountName" configuration item 
> description. Because it doesn't work at the moment, it also doesn't need to 
> work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 1.12.10, 1.11.10)

2020-01-24 Thread Mathew Wicks (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023439#comment-17023439
 ] 

Mathew Wicks commented on SPARK-28921:
--

[~dongjoon], it's just very bad practice to not update all jars which depend on 
each other, so I never tried to only do one. However, I also remember reading 
people who said they encountered errors while only updating one, on other 
threads about this issue.

> Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 
> 1.12.10, 1.11.10)
> ---
>
> Key: SPARK-28921
> URL: https://issues.apache.org/jira/browse/SPARK-28921
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1, 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4
>Reporter: Paul Schweigert
>Assignee: Andy Grove
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> Spark jobs are failing on latest versions of Kubernetes when jobs attempt to 
> provision executor pods (jobs like Spark-Pi that do not launch executors run 
> without a problem):
>  
> Here's an example error message:
>  
> {code:java}
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: 
> HTTP 403, Status: 403 - 
> java.net.ProtocolException: Expected HTTP 101 response but was '403 
> Forbidden' 
> at 
> okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) 
> at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) 
> at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) 
> at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
> at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> Looks like the issue is caused by fixes for a recent CVE : 
> CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809]
> Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669]
>  
> Looks like upgrading kubernetes-client to 4.4.2 would solve this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 1.12.10, 1.11.10)

2020-01-23 Thread Mathew Wicks (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022625#comment-17022625
 ] 

Mathew Wicks commented on SPARK-28921:
--

It is not enough to replace the kuberntes-client.jar in your $SPARK_HOME/jars, 
you must also replace:
* $SPARK_HOME/jars/kubernetes-client-*.jar
* $SPARK_HOME/jars/kubernetes-model-common-*jar
* $SPARK_HOME/jars/kubernetes-model-*.jar 
* $SPARK_HOME/jars/okhttp-*.jar
* $SPARK_HOME/jars/okio-*.jar

With the versions specified in this PR:
https://github.com/apache/spark/commit/65c0a7812b472147c615fb4fe779da9d0a11ff18

> Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 
> 1.12.10, 1.11.10)
> ---
>
> Key: SPARK-28921
> URL: https://issues.apache.org/jira/browse/SPARK-28921
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1, 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4
>Reporter: Paul Schweigert
>Assignee: Andy Grove
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> Spark jobs are failing on latest versions of Kubernetes when jobs attempt to 
> provision executor pods (jobs like Spark-Pi that do not launch executors run 
> without a problem):
>  
> Here's an example error message:
>  
> {code:java}
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: 
> HTTP 403, Status: 403 - 
> java.net.ProtocolException: Expected HTTP 101 response but was '403 
> Forbidden' 
> at 
> okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) 
> at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) 
> at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) 
> at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
> at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> Looks like the issue is caused by fixes for a recent CVE : 
> CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809]
> Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669]
>  
> Looks like upgrading kubernetes-client to 4.4.2 would solve this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 1.12.10, 1.11.10)

2020-01-23 Thread Mathew Wicks (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022625#comment-17022625
 ] 

Mathew Wicks edited comment on SPARK-28921 at 1/24/20 1:03 AM:
---

It is not enough to replace the kuberntes-client.jar in your $SPARK_HOME/jars, 
you must also replace:
 * $SPARK_HOME/jars/kubernetes-client-*.jar
 * $SPARK_HOME/jars/kubernetes-model-common-*jar
 * $SPARK_HOME/jars/kubernetes-model-*.jar
 * $SPARK_HOME/jars/okhttp-*.jar
 * $SPARK_HOME/jars/okio-*.jar

With the versions specified in this PR:
 
[https://github.com/apache/spark/commit/65c0a7812b472147c615fb4fe779da9d0a11ff18]


was (Author: thesuperzapper):
It is not enough to replace the kuberntes-client.jar in your $SPARK_HOME/jars, 
you must also replace:
* $SPARK_HOME/jars/kubernetes-client-*.jar
* $SPARK_HOME/jars/kubernetes-model-common-*jar
* $SPARK_HOME/jars/kubernetes-model-*.jar 
* $SPARK_HOME/jars/okhttp-*.jar
* $SPARK_HOME/jars/okio-*.jar

With the versions specified in this PR:
https://github.com/apache/spark/commit/65c0a7812b472147c615fb4fe779da9d0a11ff18

> Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 
> 1.12.10, 1.11.10)
> ---
>
> Key: SPARK-28921
> URL: https://issues.apache.org/jira/browse/SPARK-28921
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1, 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4
>Reporter: Paul Schweigert
>Assignee: Andy Grove
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> Spark jobs are failing on latest versions of Kubernetes when jobs attempt to 
> provision executor pods (jobs like Spark-Pi that do not launch executors run 
> without a problem):
>  
> Here's an example error message:
>  
> {code:java}
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: 
> HTTP 403, Status: 403 - 
> java.net.ProtocolException: Expected HTTP 101 response but was '403 
> Forbidden' 
> at 
> okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) 
> at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) 
> at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) 
> at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
> at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> Looks like the issue is caused by fixes for a recent CVE : 
> CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809]
> Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669]
>  
> Looks like upgrading kubernetes-client to 4.4.2 would solve this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24632) Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers for persistence

2019-07-18 Thread Mathew Wicks (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888495#comment-16888495
 ] 

Mathew Wicks commented on SPARK-24632:
--

I have an elegant solution for this:

You can include a separate Python package which mirrors the class address for 
the java objects you wrap. For example, in the PySpark API for XGBoost I did 
created the following package for objects under 
*ml.dmlc.xgboost4j.scala.spark._*
{code:java}
ml/__init__.py
ml/dmlc/__init__.py
ml/dmlc/xgboost4j/__init__.py
ml/dmlc/xgboost4j/scala/__init__.py
ml/dmlc/xgboost4j/scala/spark/__init__.py
{code}
With all __init__.py empty except the final one, which contained:
{code:java}
import sys
from sparkxgb import xgboost

# Allows Pipeline()/PipelineModel() with XGBoost stages to be loaded from disk.
# Needed because they try to import Python objects from their Java location.
sys.modules['ml.dmlc.xgboost4j.scala.spark'] = xgboost
{code}
Where my actual Python wrapper classes are under *sparkxgb.xgboost*.

 

This works because PySpark will try import from the Java address of the class, 
even though it's in Python.

 

For more context: can find [the initial PR 
here|https://github.com/dmlc/xgboost/pull/4656].

> Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers 
> for persistence
> --
>
> Key: SPARK-24632
> URL: https://issues.apache.org/jira/browse/SPARK-24632
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: Joseph K. Bradley
>Priority: Major
>
> This is a follow-up for [SPARK-17025], which allowed users to implement 
> Python PipelineStages in 3rd-party libraries, include them in Pipelines, and 
> use Pipeline persistence.  This task is to make it easier for 3rd-party 
> libraries to have PipelineStages written in Java and then to use pyspark.ml 
> abstractions to create wrappers around those Java classes.  This is currently 
> possible, except that users hit bugs around persistence.
> I spent a bit thinking about this and wrote up thoughts and a proposal in the 
> doc linked below.  Summary of proposal:
> Require that 3rd-party libraries with Java classes with Python wrappers 
> implement a trait which provides the corresponding Python classpath in some 
> field:
> {code}
> trait PythonWrappable {
>   def pythonClassPath: String = …
> }
> MyJavaType extends PythonWrappable
> {code}
> This will not be required for MLlib wrappers, which we can handle specially.
> One issue for this task will be that we may have trouble writing unit tests.  
> They would ideally test a Java class + Python wrapper class pair sitting 
> outside of pyspark.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28032) DataFrame.saveAsTable( in AVRO format with Timestamps create bad Hive tables

2019-06-12 Thread Mathew Wicks (JIRA)

Mathew Wicks created SPARK-28032:


 Summary: DataFrame.saveAsTable( in AVRO format with Timestamps 
create bad Hive tables
 Key: SPARK-28032
 URL: https://issues.apache.org/jira/browse/SPARK-28032
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.3
 Environment: Spark 2.4.3

Hive 1.1.0
Reporter: Mathew Wicks


I am not sure if it's my very old version of Hive (1.1.0), but when I use the 
following code, I end up with a table which Spark can read, but Hive cannot.

That is to say, when writing AVRO format tables, they cannot be read in Hive if 
they contain timestamp types.

*Hive error:*
{code:java}
Error while compiling statement: FAILED: UnsupportedOperationException 
timestamp is not supported.
{code}
*Spark Code:*
{code:java}
import java.sql.Timestamp
import spark.implicits._

val currentTime = new Timestamp(System.currentTimeMillis())
 
val df = Seq(
 (currentTime)
).toDF()

df.write.mode("overwrite").format("avro").saveAsTable("database.table_name")
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28008) Default values & column comments in AVRO schema converters

2019-06-12 Thread Mathew Wicks (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862699#comment-16862699
 ] 

Mathew Wicks commented on SPARK-28008:
--

The only issue I could think, would be that the column comments aren't saved. 
(Which some users might want)

 

While I agree it doesn't seem like the api should be public, it is useful to 
know what schema a dataframe will be written with. (Some spark type have to be 
converted for avro). Also, the user might want to make changes and then use the 
"avroSchema" writer option, for example, writing timestamps in 
"timestamp-milis" type rather than "timestamp-micro".

 

Beyond that, is there really harm in having a more correct conversion from the 
StructType into AVRO Schema?

> Default values & column comments in AVRO schema converters
> --
>
> Key: SPARK-28008
> URL: https://issues.apache.org/jira/browse/SPARK-28008
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Mathew Wicks
>Priority: Major
>
> Currently in both `toAvroType` and `toSqlType` 
> [SchemaConverters.scala#L134|https://github.com/apache/spark/blob/branch-2.4/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L134]
>  there are two behaviours which are unexpected.
> h2. Nullable fields in spark are converted to UNION[TYPE, NULL] and no 
> default value is set:
> *Current Behaviour:*
> {code:java}
> import org.apache.spark.sql.avro.SchemaConverters
> import org.apache.spark.sql.types._
> val schema = new StructType().add("a", "string", nullable = true)
> val avroSchema = SchemaConverters.toAvroType(schema)
> println(avroSchema.toString(true))
> {
>   "type" : "record",
>   "name" : "topLevelRecord",
>   "fields" : [ {
> "name" : "a",
> "type" : [ "string", "null" ]
>   } ]
> }
> {code}
> *Expected Behaviour:*
> (NOTE: The reversal of "null" & "string" in the union, needed for a default 
> value of null)
> {code:java}
> import org.apache.spark.sql.avro.SchemaConverters
> import org.apache.spark.sql.types._
> val schema = new StructType().add("a", "string", nullable = true)
> val avroSchema = SchemaConverters.toAvroType(schema)
> println(avroSchema.toString(true))
> {
>   "type" : "record",
>   "name" : "topLevelRecord",
>   "fields" : [ {
> "name" : "a",
> "type" : [ "null", "string" ],
> "default" : null
>   } ]
> }{code}
> h2. Field comments/metadata is not propagated:
> *Current Behaviour:*
> {code:java}
> import org.apache.spark.sql.avro.SchemaConverters
> import org.apache.spark.sql.types._
> val schema = new StructType().add("a", "string", nullable=false, 
> comment="AAA")
> val avroSchema = SchemaConverters.toAvroType(schema)
> println(avroSchema.toString(true))
> {
>   "type" : "record",
>   "name" : "topLevelRecord",
>   "fields" : [ {
> "name" : "a",
> "type" : "string"
>   } ]
> }{code}
> *Expected Behaviour:*
> {code:java}
> import org.apache.spark.sql.avro.SchemaConverters
> import org.apache.spark.sql.types._
> val schema = new StructType().add("a", "string", nullable=false, 
> comment="AAA")
> val avroSchema = SchemaConverters.toAvroType(schema)
> println(avroSchema.toString(true))
> {
>   "type" : "record",
>   "name" : "topLevelRecord",
>   "fields" : [ {
> "name" : "a",
> "type" : "string",
> "doc" : "AAA"
>   } ]
> }{code}
>  
> The behaviour should be similar (but the reverse) for `toSqlType`.
> I think we should aim to get this in before 3.0, as it will probably be a 
> breaking change for some usage of the AVRO API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28008) Default values & column comments in AVRO schema converters

2019-06-11 Thread Mathew Wicks (JIRA)

Mathew Wicks created SPARK-28008:


 Summary: Default values & column comments in AVRO schema converters
 Key: SPARK-28008
 URL: https://issues.apache.org/jira/browse/SPARK-28008
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.3
Reporter: Mathew Wicks


Currently in both `toAvroType` and `toSqlType` 
[SchemaConverters.scala#L134|https://github.com/apache/spark/blob/branch-2.4/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L134]
 there are two behaviours which are unexpected.
h2. Nullable fields in spark are converted to UNION[TYPE, NULL] and no default 
value is set:

*Current Behaviour:*
{code:java}
import org.apache.spark.sql.avro.SchemaConverters
import org.apache.spark.sql.types._

val schema = new StructType().add("a", "string", nullable = true)
val avroSchema = SchemaConverters.toAvroType(schema)

println(avroSchema.toString(true))
{
  "type" : "record",
  "name" : "topLevelRecord",
  "fields" : [ {
"name" : "a",
"type" : [ "string", "null" ]
  } ]
}
{code}
*Expected Behaviour:*

(NOTE: The reversal of "null" & "string" in the union, needed for a default 
value of null)
{code:java}
import org.apache.spark.sql.avro.SchemaConverters
import org.apache.spark.sql.types._

val schema = new StructType().add("a", "string", nullable = true)
val avroSchema = SchemaConverters.toAvroType(schema)

println(avroSchema.toString(true))
{
  "type" : "record",
  "name" : "topLevelRecord",
  "fields" : [ {
"name" : "a",
"type" : [ "null", "string" ],
"default" : null
  } ]
}{code}
h2. Field comments/metadata is not propagated:

*Current Behaviour:*
{code:java}
import org.apache.spark.sql.avro.SchemaConverters
import org.apache.spark.sql.types._

val schema = new StructType().add("a", "string", nullable=false, 
comment="AAA")
val avroSchema = SchemaConverters.toAvroType(schema)

println(avroSchema.toString(true))
{
  "type" : "record",
  "name" : "topLevelRecord",
  "fields" : [ {
"name" : "a",
"type" : "string"
  } ]
}{code}
*Expected Behaviour:*
{code:java}
import org.apache.spark.sql.avro.SchemaConverters
import org.apache.spark.sql.types._

val schema = new StructType().add("a", "string", nullable=false, 
comment="AAA")
val avroSchema = SchemaConverters.toAvroType(schema)

println(avroSchema.toString(true))
{
  "type" : "record",
  "name" : "topLevelRecord",
  "fields" : [ {
"name" : "a",
"type" : "string",
"doc" : "AAA"
  } ]
}{code}
 

The behaviour should be similar (but the reverse) for `toSqlType`.

I think we should aim to get this in before 3.0, as it will probably be a 
breaking change for some usage of the AVRO API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-17477) SparkSQL cannot handle schema evolution from Int -> Long when parquet files have Int as its type while hive metastore has Long as its type

2019-05-23 Thread Mathew Wicks (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16847192#comment-16847192
 ] 

Mathew Wicks edited comment on SPARK-17477 at 5/24/19 2:52 AM:
---

*UPDATE:* Sorry, I was mistaken, this is still an issue in Spark 2.4

 

This only seems to be an issue if "spark.sql.parquet.writeLegacyFormat=false" 
when I set "spark.sql.parquet.writeLegacyFormat=true" this issue goes away.

(For Hive 1.1.0 and Spark 2.4.3)


was (Author: thesuperzapper):
This only seems to be an issue if "spark.sql.parquet.writeLegacyFormat=false" 
when I set "spark.sql.parquet.writeLegacyFormat=true" this issue goes away.

(For Hive 1.1.0 and Spark 2.4.3)

> SparkSQL cannot handle schema evolution from Int -> Long when parquet files 
> have Int as its type while hive metastore has Long as its type
> --
>
> Key: SPARK-17477
> URL: https://issues.apache.org/jira/browse/SPARK-17477
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Gang Wu
>Priority: Major
>
> When using SparkSession to read a Hive table which is stored as parquet 
> files. If there has been a schema evolution from int to long of a column. 
> There are some old parquet files use int for the column while some new 
> parquet files use long. In Hive metastore, the type is long (bigint).
> Therefore when I use the following:
> {quote}
> sparkSession.sql("select * from table").show()
> {quote}
> I got the following exception:
> {quote}
> 16/08/29 17:50:20 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 3.0 
> (TID 91, XXX): org.apache.parquet.io.ParquetDecodingException: Can not read 
> value at 0 in block 0 in file 
> hdfs://path/to/parquet/1-part-r-0-d8e4f5aa-b6b9-4cad-8432-a7ae7a590a93.gz.parquet
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:228)
>   at 
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:201)
>   at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:36)
>   at 
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:128)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:784)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:784)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
>   at 
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to 
> org.apache.spark.sql.catalyst.expressions.MutableInt
>   at 
> org.apache.spark.sql.catalyst.expressions.SpecificMutableRow.setInt(SpecificMutableRow.scala:246)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$RowUpdater.setInt(ParquetRowConverter.scala:161)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetPrimitiveConverter.addInt

[jira] [Issue Comment Deleted] (SPARK-16544) Support for conversion from compatible schema for Parquet data source when data types are not matched

2019-05-23 Thread Mathew Wicks (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-16544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mathew Wicks updated SPARK-16544:
-
Comment: was deleted

(was: This only seems to be an issue if 
"spark.sql.parquet.writeLegacyFormat=false" when I set 
"spark.sql.parquet.writeLegacyFormat=true" this issue goes away.

(For Hive 1.1.0 and Spark 2.4.3))

> Support for conversion from compatible schema for Parquet data source when 
> data types are not matched
> -
>
> Key: SPARK-16544
> URL: https://issues.apache.org/jira/browse/SPARK-16544
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> This deals with scenario 1 - case - 1 from the parent issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17477) SparkSQL cannot handle schema evolution from Int -> Long when parquet files have Int as its type while hive metastore has Long as its type

2019-05-23 Thread Mathew Wicks (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16847192#comment-16847192
 ] 

Mathew Wicks commented on SPARK-17477:
--

This only seems to be an issue if "spark.sql.parquet.writeLegacyFormat=false" 
when I set "spark.sql.parquet.writeLegacyFormat=true" this issue goes away.

(For Hive 1.1.0 and Spark 2.4.3)

> SparkSQL cannot handle schema evolution from Int -> Long when parquet files 
> have Int as its type while hive metastore has Long as its type
> --
>
> Key: SPARK-17477
> URL: https://issues.apache.org/jira/browse/SPARK-17477
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Gang Wu
>Priority: Major
>
> When using SparkSession to read a Hive table which is stored as parquet 
> files. If there has been a schema evolution from int to long of a column. 
> There are some old parquet files use int for the column while some new 
> parquet files use long. In Hive metastore, the type is long (bigint).
> Therefore when I use the following:
> {quote}
> sparkSession.sql("select * from table").show()
> {quote}
> I got the following exception:
> {quote}
> 16/08/29 17:50:20 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 3.0 
> (TID 91, XXX): org.apache.parquet.io.ParquetDecodingException: Can not read 
> value at 0 in block 0 in file 
> hdfs://path/to/parquet/1-part-r-0-d8e4f5aa-b6b9-4cad-8432-a7ae7a590a93.gz.parquet
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:228)
>   at 
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:201)
>   at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:36)
>   at 
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:128)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:784)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:784)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
>   at 
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to 
> org.apache.spark.sql.catalyst.expressions.MutableInt
>   at 
> org.apache.spark.sql.catalyst.expressions.SpecificMutableRow.setInt(SpecificMutableRow.scala:246)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$RowUpdater.setInt(ParquetRowConverter.scala:161)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetPrimitiveConverter.addInt(ParquetRowConverter.scala:85)
>   at 
> org.apache.parquet.column.impl.ColumnReaderImpl$2$3.writeValue(ColumnReaderImpl.java:249)
>   at 
> org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:365)
>   at 
> org.apache.parquet.io.RecordReaderImplementation.rea

[jira] [Commented] (SPARK-16544) Support for conversion from compatible schema for Parquet data source when data types are not matched

2019-05-23 Thread Mathew Wicks (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-16544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16847193#comment-16847193
 ] 

Mathew Wicks commented on SPARK-16544:
--

This only seems to be an issue if "spark.sql.parquet.writeLegacyFormat=false" 
when I set "spark.sql.parquet.writeLegacyFormat=true" this issue goes away.

(For Hive 1.1.0 and Spark 2.4.3)

> Support for conversion from compatible schema for Parquet data source when 
> data types are not matched
> -
>
> Key: SPARK-16544
> URL: https://issues.apache.org/jira/browse/SPARK-16544
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> This deals with scenario 1 - case - 1 from the parent issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26388) No support for "alter table .. replace columns" to drop columns

2019-05-23 Thread Mathew Wicks (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846528#comment-16846528
 ] 

Mathew Wicks commented on SPARK-26388:
--

After a bit of investigating, it seems like the HiveExternalCatalog API has 
most of the needed functionality already, so just the SQL part need to be 
implemented.

 

If there was a table named "database_name.table_name", you could overwrite its 
schema with this:
{code:scala}
import org.apache.spark.sql.types.StructType

val schema = new StructType()
  .add("a", "string", nullable = true)
  .add("b", "string", nullable = true)
  .add("c", "string", nullable = true)

spark.sharedState.externalCatalog.alterTableDataSchema("database_name", 
"table_name", schema)
{code}
 
 Here is a link to the alterTableDataSchema() method:
 
[https://github.com/apache/spark/blob/branch-2.4/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L647]

> No support for "alter table .. replace columns" to drop columns
> ---
>
> Key: SPARK-26388
> URL: https://issues.apache.org/jira/browse/SPARK-26388
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.3.1, 2.3.2
>Reporter: nirav patel
>Priority: Major
>
> Looks like hive {{replace columns}} is not working with spark 2.2.1 and 2.3.1
>   
> create table myschema.mytable(a int, b int, c int)
> alter table myschema.mytable replace columns (a int,b int,d int)
>  
> *Expected Behavior*
> it should drop column c and add column d.
> alter table... replace columns.. should work just as it works in hive.
> It replaces existing columns with new ones. It delete if column is not 
> mentioned.
>  
> here's the snippet of hive cli:
> hive> desc mytable;
> OK
> a                   int                                     
> b                   int                                     
> c                   int                                     
> Time taken: 0.05 seconds, Fetched: 3 row(s)
> hive> alter table mytable replace columns(a int, b int, d int);
> OK
> Time taken: 0.078 seconds
> hive> desc mytable;
> OK
> a                   int                                     
> b                   int                                     
> d                   int                                     
> Time taken: 0.03 seconds, Fetched: 3 row(s)
>  
> *Actual Result*
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: Operation not allowed: 
> alter table replace columns
>  {{ADD COLUMNS}} works which seemed to previously reported and fixed as well:
> https://issues.apache.org/jira/browse/SPARK-18893
>  
> Replace columns should be supported as well. afaik, that's the only way to 
> delete hive columns.
>   
>   
>  It supposed to work according to this docs:
>  
> [https://docs.databricks.com/spark/latest/spark-sql/language-manual/alter-table-or-view.html#replace-columns]
>  
> [https://spark.apache.org/docs/2.2.0/sql-programming-guide.html#supported-hive-features]
>   
>  but it's throwing error for me on 2 different versions.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26388) No support for "alter table .. replace columns" to drop columns

2019-05-23 Thread Mathew Wicks (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846490#comment-16846490
 ] 

Mathew Wicks commented on SPARK-26388:
--

This test suite seems to imply this feature is not supported:

[https://github.com/apache/spark/blob/branch-2.4/sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala#L460]

> No support for "alter table .. replace columns" to drop columns
> ---
>
> Key: SPARK-26388
> URL: https://issues.apache.org/jira/browse/SPARK-26388
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.3.1, 2.3.2
>Reporter: nirav patel
>Priority: Major
>
> Looks like hive {{replace columns}} is not working with spark 2.2.1 and 2.3.1
>   
> create table myschema.mytable(a int, b int, c int)
> alter table myschema.mytable replace columns (a int,b int,d int)
>  
> *Expected Behavior*
> it should drop column c and add column d.
> alter table... replace columns.. should work just as it works in hive.
> It replaces existing columns with new ones. It delete if column is not 
> mentioned.
>  
> here's the snippet of hive cli:
> hive> desc mytable;
> OK
> a                   int                                     
> b                   int                                     
> c                   int                                     
> Time taken: 0.05 seconds, Fetched: 3 row(s)
> hive> alter table mytable replace columns(a int, b int, d int);
> OK
> Time taken: 0.078 seconds
> hive> desc mytable;
> OK
> a                   int                                     
> b                   int                                     
> d                   int                                     
> Time taken: 0.03 seconds, Fetched: 3 row(s)
>  
> *Actual Result*
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: Operation not allowed: 
> alter table replace columns
>  {{ADD COLUMNS}} works which seemed to previously reported and fixed as well:
> https://issues.apache.org/jira/browse/SPARK-18893
>  
> Replace columns should be supported as well. afaik, that's the only way to 
> delete hive columns.
>   
>   
>  It supposed to work according to this docs:
>  
> [https://docs.databricks.com/spark/latest/spark-sql/language-manual/alter-table-or-view.html#replace-columns]
>  
> [https://spark.apache.org/docs/2.2.0/sql-programming-guide.html#supported-hive-features]
>   
>  but it's throwing error for me on 2 different versions.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26388) No support for "alter table .. replace columns" to drop columns

2019-05-22 Thread Mathew Wicks (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846469#comment-16846469
 ] 

Mathew Wicks commented on SPARK-26388:
--

This appears to still be an issue in Spark 2.4.3.

 

For all queries involving "ALTER TABLE table_name REPLACE COLUMNS (col_name 
STRING, ...)" you get:
{code:java}
Operation not allowed: ALTER TABLE REPLACE COLUMNS(line 1, pos 0){code}
 

At very least we need to highlight this in the docs, as we currently say we 
support all Hive ALTER TABLE commands here:

[https://spark.apache.org/docs/2.4.0/sql-migration-guide-hive-compatibility.html#supported-hive-features]

 

> No support for "alter table .. replace columns" to drop columns
> ---
>
> Key: SPARK-26388
> URL: https://issues.apache.org/jira/browse/SPARK-26388
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.3.1, 2.3.2
>Reporter: nirav patel
>Priority: Major
>
> Looks like hive {{replace columns}} is not working with spark 2.2.1 and 2.3.1
>   
> create table myschema.mytable(a int, b int, c int)
> alter table myschema.mytable replace columns (a int,b int,d int)
>  
> *Expected Behavior*
> it should drop column c and add column d.
> alter table... replace columns.. should work just as it works in hive.
> It replaces existing columns with new ones. It delete if column is not 
> mentioned.
>  
> here's the snippet of hive cli:
> hive> desc mytable;
> OK
> a                   int                                     
> b                   int                                     
> c                   int                                     
> Time taken: 0.05 seconds, Fetched: 3 row(s)
> hive> alter table mytable replace columns(a int, b int, d int);
> OK
> Time taken: 0.078 seconds
> hive> desc mytable;
> OK
> a                   int                                     
> b                   int                                     
> d                   int                                     
> Time taken: 0.03 seconds, Fetched: 3 row(s)
>  
> *Actual Result*
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: Operation not allowed: 
> alter table replace columns
>  {{ADD COLUMNS}} works which seemed to previously reported and fixed as well:
> https://issues.apache.org/jira/browse/SPARK-18893
>  
> Replace columns should be supported as well. afaik, that's the only way to 
> delete hive columns.
>   
>   
>  It supposed to work according to this docs:
>  
> [https://docs.databricks.com/spark/latest/spark-sql/language-manual/alter-table-or-view.html#replace-columns]
>  
> [https://spark.apache.org/docs/2.2.0/sql-programming-guide.html#supported-hive-features]
>   
>  but it's throwing error for me on 2 different versions.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-21423) MODE average aggregate function.

2017-07-14 Thread Mathew Wicks (JIRA)

Mathew Wicks created SPARK-21423:


 Summary: MODE average aggregate function.
 Key: SPARK-21423
 URL: https://issues.apache.org/jira/browse/SPARK-21423
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.2.0
Reporter: Mathew Wicks
Priority: Minor


Having a MODE() aggregate function which returns the mode average of a 
group/window would be very useful. 

For example, if the column type is a number, it finds the most common number, 
and if the column type is a string, it finds the most common string.

I appreciate that doing this in a scalable way will require some 
thinking/discussion.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-20353) Implement Tensorflow TFRecords file format

2017-04-17 Thread Mathew Wicks (JIRA)

Mathew Wicks created SPARK-20353:


 Summary: Implement Tensorflow TFRecords file format
 Key: SPARK-20353
 URL: https://issues.apache.org/jira/browse/SPARK-20353
 Project: Spark
  Issue Type: Improvement
  Components: Input/Output, SQL
Affects Versions: 2.1.0
Reporter: Mathew Wicks


Spark is a very good prepossessing engine for tools like Tensorflow. However, 
we lack native support for Tensorflow's core file format, TFRecords.

There is a project which implements this functionality as an external JAR. (But 
is not user friendly, or robust enough for production use.)
https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-connector

Here is some discussion around the above.
https://github.com/tensorflow/ecosystem/issues/32

If we were to implement "tfrecords" as a data-frame writable/readable format, 
we would have to account for the various datatypes that can be present in spark 
columns, and which ones are actually useful in Tensorflow. 

Note: The `spark-tensorflow-connector` described above, does not properly 
support the vector data type. 

Further discussion of whether this is within the scope of Spark SQL is strongly 
welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20207) Add ablity to exclude current row in WindowSpec

2017-04-05 Thread Mathew Wicks (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956403#comment-15956403
 ] 

Mathew Wicks commented on SPARK-20207:
--

A toy example is given in the StackOverflow. 

Although an alternative solution would be to implement array concatenation. 
Because, for most aggregations, you can split the calculation into the 'before 
current row' and 'after current row' partitions (think SUM()), but functions 
like COLLECT_LIST(), this is not possible. 

There is precedent for array concatenation in SQL, for example ARRAY_CONCAT() 
in BigQuery or ARRAY_CAT() in PostgresQL.

http://www.w3resource.com/PostgreSQL/postgresql_array_cat-function.php 

> Add ablity to exclude current row in WindowSpec
> ---
>
> Key: SPARK-20207
> URL: https://issues.apache.org/jira/browse/SPARK-20207
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Mathew Wicks
>Priority: Minor
>
> It would be useful if we could implement a way to exclude the current row in 
> WindowSpec. (We can currently only select ranges of rows/time.)
> Currently, users have to resort to ridiculous measures to exclude the current 
> row from windowing aggregations. 
> As seen here:
> http://stackoverflow.com/questions/43180723/spark-sql-excluding-the-current-row-in-partition-by-windowing-functions/43198839#43198839



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20207) Add ablity to exclude current row in WindowSpec

2017-04-04 Thread Mathew Wicks (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956372#comment-15956372
 ] 

Mathew Wicks commented on SPARK-20207:
--

Well, technically no, but it would be a good place to implement this 
functionality.

Where would you suggest implementing it?

> Add ablity to exclude current row in WindowSpec
> ---
>
> Key: SPARK-20207
> URL: https://issues.apache.org/jira/browse/SPARK-20207
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Mathew Wicks
>Priority: Minor
>
> It would be useful if we could implement a way to exclude the current row in 
> WindowSpec. (We can currently only select ranges of rows/time.)
> Currently, users have to resort to ridiculous measures to exclude the current 
> row from windowing aggregations. 
> As seen here:
> http://stackoverflow.com/questions/43180723/spark-sql-excluding-the-current-row-in-partition-by-windowing-functions/43198839#43198839



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20207) Add ablity to exclude current row in WindowSpec

2017-04-04 Thread Mathew Wicks (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956363#comment-15956363
 ] 

Mathew Wicks commented on SPARK-20207:
--

Sorry I was a bit unclear, we would want to define a window over every row in 
the partition, except the current row.

> Add ablity to exclude current row in WindowSpec
> ---
>
> Key: SPARK-20207
> URL: https://issues.apache.org/jira/browse/SPARK-20207
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Mathew Wicks
>Priority: Minor
>
> It would be useful if we could implement a way to exclude the current row in 
> WindowSpec. (We can currently only select ranges of rows/time.)
> Currently, users have to resort to ridiculous measures to exclude the current 
> row from windowing aggregations. 
> As seen here:
> http://stackoverflow.com/questions/43180723/spark-sql-excluding-the-current-row-in-partition-by-windowing-functions/43198839#43198839



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-20207) Add ablity to exclude current row in WindowSpec

2017-04-03 Thread Mathew Wicks (JIRA)

Mathew Wicks created SPARK-20207:


 Summary: Add ablity to exclude current row in WindowSpec
 Key: SPARK-20207
 URL: https://issues.apache.org/jira/browse/SPARK-20207
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.1.0
Reporter: Mathew Wicks
Priority: Minor


It would be useful if we could implement a way to exclude the current row in 
WindowSpec. (We can currently only select ranges of rows/time.)

Currently, users have to resort to ridiculous measures to exclude the current 
row from windowing aggregations. 

As seen here:
http://stackoverflow.com/questions/43180723/spark-sql-excluding-the-current-row-in-partition-by-windowing-functions/43198839#43198839



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28360) The serviceAccountName configuration item does not take effect in client mode.

[jira] [Commented] (SPARK-32226) JDBC TimeStamp predicates always append `.0`

[jira] [Created] (SPARK-32226) JDBC TimeStamp predicates always append `.0`

[jira] [Commented] (SPARK-26295) [K8S] serviceAccountName is not set in client mode

[jira] [Commented] (SPARK-28360) The serviceAccountName configuration item does not take effect in client mode.

[jira] [Commented] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 1.12.10, 1.11.10)

[jira] [Commented] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 1.12.10, 1.11.10)

[jira] [Comment Edited] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 1.12.10, 1.11.10)

[jira] [Commented] (SPARK-24632) Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers for persistence

[jira] [Created] (SPARK-28032) DataFrame.saveAsTable( in AVRO format with Timestamps create bad Hive tables

[jira] [Commented] (SPARK-28008) Default values & column comments in AVRO schema converters

[jira] [Created] (SPARK-28008) Default values & column comments in AVRO schema converters

[jira] [Comment Edited] (SPARK-17477) SparkSQL cannot handle schema evolution from Int -> Long when parquet files have Int as its type while hive metastore has Long as its type

[jira] [Issue Comment Deleted] (SPARK-16544) Support for conversion from compatible schema for Parquet data source when data types are not matched

[jira] [Commented] (SPARK-17477) SparkSQL cannot handle schema evolution from Int -> Long when parquet files have Int as its type while hive metastore has Long as its type

[jira] [Commented] (SPARK-16544) Support for conversion from compatible schema for Parquet data source when data types are not matched

[jira] [Commented] (SPARK-26388) No support for "alter table .. replace columns" to drop columns

[jira] [Commented] (SPARK-26388) No support for "alter table .. replace columns" to drop columns

[jira] [Commented] (SPARK-26388) No support for "alter table .. replace columns" to drop columns

[jira] [Created] (SPARK-21423) MODE average aggregate function.

[jira] [Created] (SPARK-20353) Implement Tensorflow TFRecords file format

[jira] [Commented] (SPARK-20207) Add ablity to exclude current row in WindowSpec

[jira] [Commented] (SPARK-20207) Add ablity to exclude current row in WindowSpec

[jira] [Commented] (SPARK-20207) Add ablity to exclude current row in WindowSpec

[jira] [Created] (SPARK-20207) Add ablity to exclude current row in WindowSpec

25 matches

Site Navigation

Mail list logo

Footer information