If you want to ensure the persisted RDD has been calculated first,
just run foreach with a dummy function first to force evaluation.
--
Michael Mior
michael.m...@gmail.com
Le jeu. 24 sept. 2020 à 00:38, Arya Ketan a écrit :
>
> Thanks, we were able to validate the same behaviour.
>
>
It's fairly common for adapters (Calcite's abstraction of a data
source) to push down predicates. However, the API certainly looks a
lot different than Catalyst's.
--
Michael Mior
mm...@apache.org
Le lun. 13 janv. 2020 à 09:45, Jason Nerothin
a écrit :
>
> The implementation they chose su
If you put a * in the path, Spark will look for a file or directory named
*. To read all the files in a directory, just remove the star.
--
Michael Mior
michael.m...@gmail.com
On Jun 22, 2017 17:21, "saatvikshah1994" <saatvikshah1...@gmail.com> wrote:
> Hi,
>
> I've dow
not very familiar with either project, so perhaps there are
some big concerns I'm not aware of.
--
Michael Mior
mm...@apache.org
2017-06-21 3:19 GMT-04:00 Rick Moritz <rah...@gmail.com>:
> Keeping it inside the same program/SparkContext is the most performant
> solution, since y
It's still in the early stages, but check out Deep Learning Pipelines from
Databricks
https://github.com/databricks/spark-deep-learning
--
Michael Mior
mm...@apache.org
2017-06-20 0:36 GMT-04:00 Gaurav1809 <gauravhpan...@gmail.com>:
> Hi All,
>
> Similar to how we have machine l
able WHERE (mycolumn BETWEEN 1 AND 2) AND
(myudfsearchfor(\"start\\\"end\"))"
--
Michael Mior
mm...@apache.org
2017-06-15 12:05 GMT-04:00 mark.jenki...@baesystems.com <
mark.jenki...@baesystems.com>:
> *Hi,*
>
>
>
> *I have a query **sqlContext.sql(“**SELE
While I'm not sure why you're seeing an increase in partitions with such a
small data file, it's worth noting that the second parameter to textFile is
the *minimum* number of partitions so there's no guarantee you'll get
exactly that number.
--
Michael Mior
mm...@apache.org
2017-06-01 6:28 GMT