I am doing a spark streaming on a hortonworks sandbox and am stuck here
now, can anyone tell me what's wrong with the following code and the
exception it causes and how do I fix it? Thank you very much in advance.
spark-submit --jars
same problem before and solved it by removing --jars.
>
> Cheers,
> Anahita
>
> On Saturday, February 25, 2017, Raymond Xie <xie3208...@gmail.com> wrote:
>
>> I am doing a spark streaming on a hortonworks sandbox and am stuck here
>> now, can anyone tell me what's wrong
; anahita.t.am...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I think if you remove --jars, it will work. Like:
>>>
>>> spark-submit /usr/hdp/2.5.0.0-1245/spark/l
>>> ib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>
&g
<mmistr...@gmail.com> wrote:
> Try to use --packages to include the jars. From error it seems it's
> looking for main class in jars but u r running a python script...
>
> On 25 Feb 2017 10:36 pm, "Raymond Xie" <xie3208...@gmail.com> wrote:
>
> That'
@granturing.com>
> *Date: *Monday, January 9, 2017 at 2:59 PM
> *To: *Raymond Xie <xie3208...@gmail.com>, user <user@spark.apache.org>
> *Subject: *Re: How to connect Tableau to databricks spark?
>
>
>
> Hi Raymond,
>
>
>
> Are you using a Spark 2.0 or
.add("minute", StringType)
> val jsonContentWithSchema = sqlContext.jsonRDD(jsonRdd, schema)
>
> But somehow i seem to remember that there was a way , in Spark 2.0, so
> that Spark will infer the schema for you..
>
> hth
> marco
>
>
>
>
&g
Hello,
I am new to Spark, as a SQL developer, I only took some courses online and
spent some time myself, never had a chance working on a real project.
I wonder what would be the best practice (tool, procedure...) to load data
(csv, excel) into Spark platform?
Thank you.
*Raymond*
xcheun...@hotmail.com>
wrote:
> Have you tried the spark-csv package?
>
> https://spark-packages.org/package/databricks/spark-csv
>
>
> ----------
> *From:* Raymond Xie <xie3208...@gmail.com>
> *Sent:* Friday, December 30, 2016 6:46:11 PM
> *T
Hello,
I see there is usually this way to load a csv to dataframe:
sqlContext = SQLContext(sc)
Employee_rdd = sc.textFile("\..\Employee.csv")
.map(lambda line: line.split(","))
Employee_df = Employee_rdd.toDF(['Employee_ID','Employee_name'])
Employee_df.show()
However in my
from my Samsung device
Original message ----
From: Raymond Xie <xie3208...@gmail.com>
Date: 31/12/2016 10:46 (GMT+08:00)
To: user@spark.apache.org
Subject: How to load a big csv to dataframe in Spark 1.6
Hello,
I see there is usually this way to load a csv to dataframe:
sqlC
kage/databricks/spark-csv
>
>
> ----------
> *From:* Raymond Xie <xie3208...@gmail.com>
> *Sent:* Friday, December 30, 2016 6:46:11 PM
> *To:* user@spark.apache.org
> *Subject:* How to load a big csv to dataframe in Spark 1.6
>
> Hello,
>
> I see there is usually thi
Hello,
It is indicated in
https://spark.apache.org/docs/1.6.1/sql-programming-guide.html#dataframes
when Running SQL Queries Programmatically you can do:
from pyspark.sql import SQLContextsqlContext = SQLContext(sc)df =
sqlContext.sql("SELECT * FROM table")
However, it did not indicate what
urs,*
*Raymond*
On Sat, Dec 31, 2016 at 11:52 PM, Miguel Morales <therevolti...@gmail.com>
wrote:
> Looks like it's trying to treat that path as a folder, try omitting
> the file name and just use the folder path.
>
> On Sat, Dec 31, 2016 at 7:58 PM, Raymond Xie <xie3208..
tting
> the file name and just use the folder path.
>
> On Sat, Dec 31, 2016 at 7:58 PM, Raymond Xie <xie3208...@gmail.com> wrote:
> > Happy new year!!!
> >
> > I am trying to load a json file into spark, the json file is attached
> here.
> >
>
**
*Sincerely yours,*
*Raymond*
I want to do some data analytics work by leveraging Databricks spark
platform and connect my Tableau desktop to it for data visualization.
Does anyone ever make it? I've trying to follow the instruction below but
not successful?
ome path .
> May be spacial char or space on ur path.
>
> Regards,
> Vaquar khan
>
> On Sat, Jun 16, 2018, 1:36 PM Raymond Xie wrote:
>
>> I am trying to run spark-shell in Windows but receive error of:
>>
>> \Java\jre1.8.0_151\bin\java was unexpected at
I am trying to run spark-shell in Windows but receive error of:
\Java\jre1.8.0_151\bin\java was unexpected at this time.
Environment:
System variables:
SPARK_HOME:
c:\spark
Path:
C:\Program Files (x86)\Common
I have a 3.6GB csv dataset (4 columns, 100,150,807 rows), my environment is
20GB ssd harddisk and 2GB RAM.
The dataset comes with
User ID: 987,994
Item ID: 4,162,024
Category ID: 9,439
Behavior type ('pv', 'buy', 'cart', 'fav')
Unix Timestamp: span between November 25 to December 03, 2017
I
Hello,
I am wondering how can I run spark job in my environment which is a single
Ubuntu host with no hadoop installed? if I run my job like below, I will
end up with infinite loop at the end. Thank you very much.
rxie@ubuntu:~/data$ spark-submit --class retail_db.GetRevenuePerOrder
--conf
e
>
> On Jun 17, 2018, at 2:32 PM, Raymond Xie wrote:
>
> Hello,
>
> I am wondering how can I run spark job in my environment which is a single
> Ubuntu host with no hadoop installed? if I run my job like below, I will
> end up with infinite loop at the end. Thank you very
Hello,
It would be really appreciated if anyone can help sort it out the following
path issue for me? I highly doubt this is related to missing path setting
but don't know how can I fix it.
rxie@ubuntu:~/Downloads/spark$ echo $PATH
Hello, I am doing the practice in Ubuntu now, here is the error I am
encountering:
rxie@ubuntu:~/Downloads/spark/bin$ spark-shell
Error: Could not find or load main class org.apache.spark.launcher.Main
What am I missing?
Thank you very much.
Java is installed.
t;
>
> Best Regards,
>
> Vamshi T
>
>
> --
> *From:* Raymond Xie
> *Sent:* Sunday, June 17, 2018 6:27 AM
> *To:* user; Hui Xie
> *Subject:* Error: Could not find or load main class
> org.apache.spark.launcher.Main
>
> Hello,
>
> I
Hello, I am doing the practice in windows now.
I have the jar file generated under:
C:\RXIE\Learning\Scala\spark2practice\target\scala-2.
11\spark2practice_2.11-0.1.jar
The package name is Retail_db and the object is GetRevenuePerOrder.
The spark-submit command is:
spark-submit
parser is based on univocity and you might use the
> "spark.read.csc" syntax instead of using the rdd api;
>
> From my experience, this will better than any other csv parser
>
> 2018-06-19 16:43 GMT+02:00 Raymond Xie :
>
>> Thank you Matteo, Askash and Georg:
>>
>&
> wrote:
>>
>>> use pandas or dask
>>>
>>> If you do want to use spark store the dataset as parquet / orc. And then
>>> continue to perform analytical queries on that dataset.
>>>
>>> Raymond Xie schrieb am Di., 19. Juni 2018 um
Hello,
I have a dataframe, apply from_unixtime seems to expose an anomaly:
*scala> val bhDF4 = bhDF.withColumn("ts1", $"ts" + 28800).withColumn("ts2",
from_unixtime($"ts" + 28800,"MMddhhmmss"))*
*bhDF4: org.apache.spark.sql.DataFrame = [user_id: int, item_id: int ... 5
more fields]*
*scala>
28 matches
Mail list logo