Re: Is RDD.persist honoured if multiple actions are executed in parallel

2020-09-24 Thread Michael Mior
If you want to ensure the persisted RDD has been calculated first, just run foreach with a dummy function first to force evaluation. -- Michael Mior michael.m...@gmail.com Le jeu. 24 sept. 2020 à 00:38, Arya Ketan a écrit : > > Thanks, we were able to validate the same behaviour. > >

Re: Why Apache Spark doesn't use Calcite?

2020-01-13 Thread Michael Mior
It's fairly common for adapters (Calcite's abstraction of a data source) to push down predicates. However, the API certainly looks a lot different than Catalyst's. -- Michael Mior mm...@apache.org Le lun. 13 janv. 2020 à 09:45, Jason Nerothin a écrit : > > The implementation they chose su

Re: Using Spark with Local File System/NFS

2017-06-22 Thread Michael Mior
If you put a * in the path, Spark will look for a file or directory named *. To read all the files in a directory, just remove the star. -- Michael Mior michael.m...@gmail.com On Jun 22, 2017 17:21, "saatvikshah1994" <saatvikshah1...@gmail.com> wrote: > Hi, > > I've dow

Re: "Sharing" dataframes...

2017-06-21 Thread Michael Mior
not very familiar with either project, so perhaps there are some big concerns I'm not aware of. -- Michael Mior mm...@apache.org 2017-06-21 3:19 GMT-04:00 Rick Moritz <rah...@gmail.com>: > Keeping it inside the same program/SparkContext is the most performant > solution, since y

Re: Do we anything for Deep Learning in Spark?

2017-06-20 Thread Michael Mior
It's still in the early stages, but check out Deep Learning Pipelines from Databricks https://github.com/databricks/spark-deep-learning -- Michael Mior mm...@apache.org 2017-06-20 0:36 GMT-04:00 Gaurav1809 <gauravhpan...@gmail.com>: > Hi All, > > Similar to how we have machine l

Re: [SparkSQL] Escaping a query for a dataframe query

2017-06-15 Thread Michael Mior
able WHERE (mycolumn BETWEEN 1 AND 2) AND (myudfsearchfor(\"start\\\"end\"))" -- Michael Mior mm...@apache.org 2017-06-15 12:05 GMT-04:00 mark.jenki...@baesystems.com < mark.jenki...@baesystems.com>: > *Hi,* > > > > *I have a query **sqlContext.sql(“**SELE

Re: Number Of Partitions in RDD

2017-06-01 Thread Michael Mior
While I'm not sure why you're seeing an increase in partitions with such a small data file, it's worth noting that the second parameter to textFile is the *minimum* number of partitions so there's no guarantee you'll get exactly that number. -- Michael Mior mm...@apache.org 2017-06-01 6:28 GMT