I'm not sure how to describe this scenario in words, let's see some example
SQL.
Given the table schema:
create table customer (
c_custkey bigint,
c_namestring,
c_orders array>
)
Now I want to know each customer's `avg(o_totalprice)`. Maybe I can use
Maybe your application is overriding the master variable when it creates its
SparkContext. I see you are still passing “yarn-client” as an argument later to
it in your command.
> On Jun 17, 2018, at 11:53 AM, Raymond Xie wrote:
>
> Thank you Subhash.
>
> Here is the new command:
>
Hi everyone,
I am trying to understand the behaviour of .as[SomeClass] (Dataset API):
Say I have a file with Users:
case class User(id: Int, name: String, address: String, date_add: java.sql.Date)
val users = sc.parallelize(Stream.fill(100)(User(0, "test", "Test Street", new
java.sql.Date(0,
Consider there is a spark query(A) which is dependent on Kafka topics t1 and
t2.
After running this query in the streaming mode, a checkpoint(C1) directory
for the query gets created with offsets and sources directories. Now I add a
third topic(t3) on which the query is dependent.
Now if I
Thank you Subhash.
Here is the new command:
spark-submit --master local[*] --class retail_db.GetRevenuePerOrder --conf
spark.ui.port=12678 spark2practice_2.11-0.1.jar yarn-client
/public/retail_db/order_items /home/rxie/output/revenueperorder
Still seeing the same issue here.
2018-06-17 11:51:25
Hi Raymond,
If you set your master to local[*] instead of yarn-client, it should run on
your local machine.
Thanks,
Subhash
Sent from my iPhone
> On Jun 17, 2018, at 2:32 PM, Raymond Xie wrote:
>
> Hello,
>
> I am wondering how can I run spark job in my environment which is a single
>
Hello,
I am wondering how can I run spark job in my environment which is a single
Ubuntu host with no hadoop installed? if I run my job like below, I will
end up with infinite loop at the end. Thank you very much.
rxie@ubuntu:~/data$ spark-submit --class retail_db.GetRevenuePerOrder
--conf
Totally agreed with Eyal .
The problem is that when Java programs generated using Catalyst from
programs using DataFrame and Dataset are compiled into Java bytecode, the
size of byte code of one method must not be 64 KB or more, This conflicts
with the limitation of the Java class file, which is
Thank you Vamshi,
Yes the path presumably has been added, here it is:
rxie@ubuntu:~/Downloads/spark$ echo $PATH
/home/rxie/Downloads/spark
Raymond,
Is your SPARK_HOME set? In your .bash_profile, try setting the below:
export SPARK_HOME=/home/Downloads/spark (or wherever your spark is downloaded
to)
once done, source your .bash_profile or restart the shell and try spark-shell
Best Regards,
Vamshi T
Hello,
It would be really appreciated if anyone can help sort it out the following
path issue for me? I highly doubt this is related to missing path setting
but don't know how can I fix it.
rxie@ubuntu:~/Downloads/spark$ echo $PATH
Hello, I am doing the practice in Ubuntu now, here is the error I am
encountering:
rxie@ubuntu:~/Downloads/spark/bin$ spark-shell
Error: Could not find or load main class org.apache.spark.launcher.Main
What am I missing?
Thank you very much.
Java is installed.
Hi Raymond,
I see that you can make a small correction in your spark-submit command. Your
spark-submit command should say:
spark-submit --master local --class . < Jar
Location and JarName>
Example:
spark-submit --master local \
--class retail_db.GetRevenuePerOrder
Hello, I am doing the practice in windows now.
I have the jar file generated under:
C:\RXIE\Learning\Scala\spark2practice\target\scala-2.
11\spark2practice_2.11-0.1.jar
The package name is Retail_db and the object is GetRevenuePerOrder.
The spark-submit command is:
spark-submit
Hi Akash,
such errors might appear in large spark pipelines, the root cause is a 64kb
jvm limitation.
the reason that your job isn't failing at the end is due to spark fallback
- if code gen is failing, spark compiler will try to create the flow
without the code gen (less optimized)
if you do not
15 matches
Mail list logo