[jira] [Commented] (SPARK-38380) Adding a demo/walkthrough section Running Spark on Kubernetes

2022-03-01 Thread Zach (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17499785#comment-17499785
 ] 

Zach commented on SPARK-38380:
--

If it helps for discussion purposes, I'm happy to stage a draft PR with my idea 
and link it here. 

> Adding a demo/walkthrough section Running Spark on Kubernetes
> -
>
> Key: SPARK-38380
> URL: https://issues.apache.org/jira/browse/SPARK-38380
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.2.1
>Reporter: Zach
>Priority: Minor
>
> I propose adding a section to [Running Spark on Kubernetes - Spark 3.2.1 
> Documentation 
> (apache.org)|https://spark.apache.org/docs/latest/running-on-kubernetes.html#configuration]
>  that walks a user through the 'happy path' of:
>  # creating and configuring a cluster
>  # preparing an example spark job
>  # adding the JAR to the container image
>  # submitting the job to the cluster using spark-submit
>  # getting the results
> The current guide covers a lot of this in the abstract, but I have to a lot 
> of searching if I'm trying to walk through setting this up on Kubernetes the 
> first time. I feel this would significantly improve the guide.
> The first section can be extensible for local demo cluster (minikube, kind) 
> as well as cloud providers (amazon, google, azure).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38380) Adding a demo/walkthrough section Running Spark on Kubernetes

2022-03-01 Thread Zach (Jira)
Zach created SPARK-38380:


 Summary: Adding a demo/walkthrough section Running Spark on 
Kubernetes
 Key: SPARK-38380
 URL: https://issues.apache.org/jira/browse/SPARK-38380
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.2.1
Reporter: Zach


I propose adding a section to [Running Spark on Kubernetes - Spark 3.2.1 
Documentation 
(apache.org)|https://spark.apache.org/docs/latest/running-on-kubernetes.html#configuration]
 that walks a user through the 'happy path' of:
 # creating and configuring a cluster
 # preparing an example spark job
 # adding the JAR to the container image
 # submitting the job to the cluster using spark-submit
 # getting the results

The current guide covers a lot of this in the abstract, but I have to a lot of 
searching if I'm trying to walk through setting this up on Kubernetes the first 
time. I feel this would significantly improve the guide.

The first section can be extensible for local demo cluster (minikube, kind) as 
well as cloud providers (amazon, google, azure).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29005) build failed: sparkSession.createDataFrame[Int](Seq(1,2))

2019-09-07 Thread Zhou Zach (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925048#comment-16925048
 ] 

Zhou Zach commented on SPARK-29005:
---

[~angerszhuuu]

[~hyukjin.kwon]

Thanks a lot

> build failed: sparkSession.createDataFrame[Int](Seq(1,2))
> -
>
> Key: SPARK-29005
> URL: https://issues.apache.org/jira/browse/SPARK-29005
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Zhou Zach
>Priority: Trivial
>  Labels: build
>
>  
> Intellij Idea report:
> Error:(31, 18) type arguments [Int] conform to the bounds of none of the 
> overloaded alternatives of
>  value createDataFrame: [A <: Product](data: Seq[A])(implicit evidence$3: 
> reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame  [A 
> <: Product](rdd: org.apache.spark.rdd.RDD[A])(implicit evidence$2: 
> reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame
>  sparkSession.createDataFrame[Int](Seq(1,2))
>  
> Error:(31, 18) wrong number of type parameters for overloaded method value 
> createDataFrame with alternatives:
>  (data: java.util.List[_],beanClass: Class[_])org.apache.spark.sql.DataFrame 
> 
>  (rdd: org.apache.spark.api.java.JavaRDD[_],beanClass: 
> Class[_])org.apache.spark.sql.DataFrame 
>  (rdd: org.apache.spark.rdd.RDD[_],beanClass: 
> Class[_])org.apache.spark.sql.DataFrame 
>  (rows: java.util.List[org.apache.spark.sql.Row],schema: 
> org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame 
>  (rowRDD: org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema: 
> org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame 
>  (rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row],schema: 
> org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame 
>  [A <: Product](data: Seq[A])(implicit evidence$3: 
> reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame 
>  [A <: Product](rdd: org.apache.spark.rdd.RDD[A])(implicit evidence$2: 
> reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame
>  sparkSession.createDataFrame[Int](Seq(1,2))



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29005) build failed: sparkSession.createDataFrame[Int](Seq(1,2))

2019-09-07 Thread Zhou Zach (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924909#comment-16924909
 ] 

Zhou Zach commented on SPARK-29005:
---

[~angerszhuuu] Thanks for reply.But I am confused about it.I find this:

/**
 * An encoder for Scala's primitive int type.
 * @since 2.0.0
 */
def scalaInt: Encoder[Int] = ExpressionEncoder()

 

what subclasses does trait `Product` have ?

 

Only can I use Seq(1,2,3).toDF("ids"),If I want to createDataFrame by Seq. Or 

sparkSession.createDataFrame(List(Row.fromSeq(Seq(1,2,3))).asJava, schema), but 
I think it is not concise

> build failed: sparkSession.createDataFrame[Int](Seq(1,2))
> -
>
> Key: SPARK-29005
> URL: https://issues.apache.org/jira/browse/SPARK-29005
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Zhou Zach
>Priority: Trivial
>  Labels: build
>
>  
> Intellij Idea report:
> Error:(31, 18) type arguments [Int] conform to the bounds of none of the 
> overloaded alternatives of
>  value createDataFrame: [A <: Product](data: Seq[A])(implicit evidence$3: 
> reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame  [A 
> <: Product](rdd: org.apache.spark.rdd.RDD[A])(implicit evidence$2: 
> reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame
>  sparkSession.createDataFrame[Int](Seq(1,2))
>  
> Error:(31, 18) wrong number of type parameters for overloaded method value 
> createDataFrame with alternatives:
>  (data: java.util.List[_],beanClass: Class[_])org.apache.spark.sql.DataFrame 
> 
>  (rdd: org.apache.spark.api.java.JavaRDD[_],beanClass: 
> Class[_])org.apache.spark.sql.DataFrame 
>  (rdd: org.apache.spark.rdd.RDD[_],beanClass: 
> Class[_])org.apache.spark.sql.DataFrame 
>  (rows: java.util.List[org.apache.spark.sql.Row],schema: 
> org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame 
>  (rowRDD: org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema: 
> org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame 
>  (rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row],schema: 
> org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame 
>  [A <: Product](data: Seq[A])(implicit evidence$3: 
> reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame 
>  [A <: Product](rdd: org.apache.spark.rdd.RDD[A])(implicit evidence$2: 
> reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame
>  sparkSession.createDataFrame[Int](Seq(1,2))



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29005) build failed: sparkSession.createDataFrame[Int](Seq(1,2))

2019-09-05 Thread Zhou Zach (Jira)
Zhou Zach created SPARK-29005:
-

 Summary: build failed: sparkSession.createDataFrame[Int](Seq(1,2))
 Key: SPARK-29005
 URL: https://issues.apache.org/jira/browse/SPARK-29005
 Project: Spark
  Issue Type: Question
  Components: Spark Core
Affects Versions: 2.3.0
Reporter: Zhou Zach


 

Intellij Idea report:

Error:(31, 18) type arguments [Int] conform to the bounds of none of the 
overloaded alternatives of
 value createDataFrame: [A <: Product](data: Seq[A])(implicit evidence$3: 
reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame  [A <: 
Product](rdd: org.apache.spark.rdd.RDD[A])(implicit evidence$2: 
reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame
 sparkSession.createDataFrame[Int](Seq(1,2))

 

Error:(31, 18) wrong number of type parameters for overloaded method value 
createDataFrame with alternatives:
 (data: java.util.List[_],beanClass: Class[_])org.apache.spark.sql.DataFrame 

 (rdd: org.apache.spark.api.java.JavaRDD[_],beanClass: 
Class[_])org.apache.spark.sql.DataFrame 
 (rdd: org.apache.spark.rdd.RDD[_],beanClass: 
Class[_])org.apache.spark.sql.DataFrame 
 (rows: java.util.List[org.apache.spark.sql.Row],schema: 
org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame 
 (rowRDD: org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema: 
org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame 
 (rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row],schema: 
org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame 
 [A <: Product](data: Seq[A])(implicit evidence$3: 
reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame 
 [A <: Product](rdd: org.apache.spark.rdd.RDD[A])(implicit evidence$2: 
reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame
 sparkSession.createDataFrame[Int](Seq(1,2))



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28991) spark submit --conf spark.lineage.enabled参数是什么意思啊

2019-09-05 Thread Zhou Zach (Jira)
Zhou Zach created SPARK-28991:
-

 Summary: spark submit --conf spark.lineage.enabled参数是什么意思啊
 Key: SPARK-28991
 URL: https://issues.apache.org/jira/browse/SPARK-28991
 Project: Spark
  Issue Type: Story
  Components: Spark Submit
Affects Versions: 2.3.0
Reporter: Zhou Zach


spark submit --conf spark.lineage.enabled参数是什么意思啊,查了官网,网上,都没有搜到。。配置这个参数有什么益处



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28600) the method agg of KeyValueGroupedDataset has 4 TypedColumn limits

2019-08-01 Thread Zhou Zach (JIRA)
Zhou Zach created SPARK-28600:
-

 Summary:  the method  agg of KeyValueGroupedDataset has 4 
TypedColumn limits
 Key: SPARK-28600
 URL: https://issues.apache.org/jira/browse/SPARK-28600
 Project: Spark
  Issue Type: Question
  Components: Spark Core
Affects Versions: 2.3.0
Reporter: Zhou Zach


 why has the method  agg of KeyValueGroupedDataset has 4 TypedColumn parameters 
limit?because performace?too much timing?in my case, it needs more than 4 agg 
metrics at the same time, so I only agg 4 metrics and join them by key, so the 
join waste time. so I wish the agg method had more than 4 TypedColumn parameters



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org