I am using following code snippet in scala:
*val dict: RDD[String] = sc.textFile("path/to/csv/file")*
*val dict_broadcast=sc.broadcast(dict.collectAsMap())*
On compiling It generates this error:
*scala:42: value collectAsMap is not a member of
org.apache.spark.rdd.RDD[String]*
*val
Hi all,
I had couple of questions.
1. Is there documentation on how to add the graphframes or any other
package for that matter on the google dataproc managed spark clusters ?
2. Is there a way to add a package to an existing pyspark context through a
jupyter notebook ?
--aj
In Spark SQL, timestamp is the number of micro seconds since epoch, so
it has nothing with timezone.
When you compare it again unix_timestamp or string, it's better to
convert these into timestamp then compare them.
In your case, the where clause should be:
where (created > cast('{0}' as
terminal_type =0, 260,000,000 rows, almost cover half of the whole
data.terminal_type =25066, just 3800 rows.
orc
Hello,
I am learning sparkR by myself and have little computer background.
I am following the examples on
http://spark.apache.org/docs/latest/sparkr.html
and running
/sc <- sparkR.init(sparkPackages="com.databricks:spark-csv_2.11:1.0.3")
sqlContext <- sparkRSQL.init(sc)
people <-
Somewhat related, though this JIRA is on 1.6.
https://issues.apache.org/jira/browse/SPARK-13288#
Import sqlContext.implicits._ before using df ()
Sent from Samsung Mobile.
Original message From: satyajit vegesna
Date:19/03/2016 06:00 (GMT+05:30)
To: user@spark.apache.org, d...@spark.apache.org Cc:
Subject: Fwd: DF creation
Hi ,
I am
101 - 107 of 107 matches
Mail list logo