You are correct.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
You need to provide more context on what you do currently in Hive and what do
you expect from the migration.
> On 5. Apr 2018, at 05:43, Pralabh Kumar wrote:
>
> Hi Spark group
>
> What's the best way to Migrate Hive to Spark
>
> 1) Use HiveContext of Spark
> 2) Use
Hi Spark group
What's the best way to Migrate Hive to Spark
1) Use HiveContext of Spark
2) Use Hive on Spark (
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
)
3) Migrate Hive to Calcite to Spark SQL
Regards
Hi,
I am running a job in local mode, configured with local[1] for the sake of
the example. The timeline view in Spark UI is as follows:
It shows that there are actually two threads running, though their
overlapping is very small. To validate this, I also added some change to
Spark's task
I am having a heck of a time setting up my development environment. I used
pip to install pyspark. I also downloaded spark from apache.
My eclipse pyDev intereperter is configured as a python3 virtualenv
I have a simple unit test that loads a small dataframe. Df.show() generates
the following
Hi,
the way I manage things is, download spark, and set SPARK_HOME and the
import findspark and run findspark.init(). And everything else works just
fine.
I have never tried pip install pyspark though.
Regards,
Gourav Sengupta
On Wed, Apr 4, 2018 at 11:28 PM, Andy Davidson <
I am having trouble setting up my python3 virtualenv.
I created a virtualenv spark-2.3.0¹ Installed pyspark using pip how ever I
am not able to import pyspark.sql.functions. I get ³unresolved import² when
I try to import col() and lit()
from pyspark.sql.functions import *
I found if I
Each partition should be translated into one task which should run in one
executor. But one executor can process more than one task. I may be wrong,
and will be grateful if someone can correct me.
Regards,
Gourav
On Wed, Apr 4, 2018 at 8:13 PM, Thodoris Zois wrote:
>
> Hello
Hello list!
I am trying to familiarize with Apache Spark. I would like to ask something
about partitioning and executors.
Can I have e.g: 500 partitions but launch only one executor that will run
operations in only 1 partition of the 500? And then I would like my job to die.
Is there any
Hello,
I am using Apache Spark 2.2.1 with Scala. I am trying to load below JSON from
Kafka and trying to extract "JOBTYPE" and "LOADID" from the nested JSON object.
Need help with extraction logic.
Code
val workRequests = new StructType().add("after", new StructType()
yes “REST application that submits a Spark job to a k8s cluster by running
spark-submit programmatically” and also would like to expose as a
Kubernetes service so that clients can access as any other Rest api
On Wed, Apr 4, 2018 at 12:25 PM Yinan Li wrote:
> Hi Kittu,
>
>
Hi Kittu,
What do you mean by "a Scala program"? Do you mean a program that submits a
Spark job to a k8s cluster by running spark-submit programmatically, or
some example Scala application that is to run on the cluster?
On Wed, Apr 4, 2018 at 4:45 AM, Kittu M wrote:
> Hi,
Could you someone please help me how to fix this below error in spark 2.1.0
scala-2.11.8 Baically I'm migrating the code from spark 1.6.0 to
spark-2.1.0.
I'm getting the below exception in spark 2.1.0
Error: java.lang.ClassCastException: java.sql.Date cannot be cast to
java.lang.String at
Hi,
I’m looking for a Scala program to spark submit a Scala application (spark
2.3 job) on k8 cluster .
Any help would be much appreciated. Thanks
Response to the 1st approach:
When you do spark.read.text("/xyz/a/b/filename") it returns a DataFrame and
when applying the rdd methods gives you a RDD[Row], so when you use map,
your function get Row as the parameter i.e; ip in your code. Therefore you
must use the Row methods to access its
Our users ask for it
Regard,
Junfeng Chen
On Wed, Apr 4, 2018 at 5:45 PM, Gourav Sengupta
wrote:
> Hi Junfeng,
>
> can I ask why it is important to remove the empty column?
>
> Regards,
> Gourav Sengupta
>
> On Tue, Apr 3, 2018 at 4:28 AM, Junfeng Chen
Hi Junfeng,
can I ask why it is important to remove the empty column?
Regards,
Gourav Sengupta
On Tue, Apr 3, 2018 at 4:28 AM, Junfeng Chen wrote:
> I am trying to read data from kafka and writing them in parquet format via
> Spark Streaming.
> The problem is, the data
Hi,
Does anyone has good architecture document/design principle for building
warehouse application using Spark.
Is it better way of having Hive Context created with HQL and perform
transformation or Directly loading files in dataframe and perform data
transformation.
We need to implement SCD
See https://gist.github.com/geoHeil/e0799860262ceebf830859716bbf in
particular:
You will probably want to use sparks imperative (non SQL) API:
.rdd
.reduceByKey {
(count1, count2) => count1 + count2
}.map {
case ((word, path), n) => (word, (path, n))
}.toDF
i.e. builds an inverted index
which
Hi all,
I want to run huge number of queries on Dataframe in Spark. I have a big
data of text documents, I loded all documents into SparkDataFrame and
create a temp table.
dataFrame.registerTempTable("table1");
I have more than 50,000 terms, I want to get the document frequency for
each by
1st Approach:
error : value split is not a member of org.apache.spark.sql.Row?
val newRdd = spark.read.text("/xyz/a/b/filename").rdd
anotherRDD = newRdd.
map(ip =>ip.split("\\|")).map(ip => Row(if (ip(0).isEmpty()) {
null.asInstanceOf[Int] }
21 matches
Mail list logo