unsubscribe
Hi ,
I am getting error :-
---
Py4JError Traceback (most recent call last)
in ()
3 TOTAL = 100
4 dots = sc.parallelize([2.0 * np.random.random(2) - 1.0 for i in range(
TOTAL)]).cache()
> 5 print("Number of random
Hi Team,
Any good book recommendations for get in-depth knowledge from zero to
production.
Let me know.
Thanks.
utor.scala:345)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more
Please help me in this . Thanks. Nandan Priyadarshi
Hi Users,
Currently, I am trying to use Apache Spark 2.2.0 by using a Jupyter
notebook but not able to achieve it.
I am using Ubuntu 17.10.
I can able to use pyspark in command line as well as spark-shell . Please
give some ideas.
Thanks.
Nandan Priyadarshi
ucture???
Thanks and Regards,
Nandan
Hello,
I am trying to combine several small text files (each file is approx
hundreds of MBs to 2-3 gigs) into one big parquet file.
I am loading each one of them and trying to take a union, however this
leads to enormous amounts of partitions, as union keeps on adding the
partitions of the input
Hello everyone,
Generally speaking, I guess it's well known that dataframes are much faster
than RDD when it comes to performance.
My question is how do you go around when it comes to transforming a
dataframe using map.
I mean then the dataframe gets converted into RDD, hence now do you again
?
thanks
Nandan
10 matches
Mail list logo