unsubscribe

2023-09-13 Thread randy clinton
unsubscribe -- I appreciate your time, ~Randy - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

unsubscribe

2022-07-14 Thread randy clinton
-- I appreciate your time, ~Randy

Re: Hey good looking toPandas () error stack

2020-06-21 Thread randy clinton
You can see from the GitHub history for "toPandas()" that the function has been in the code for 5 years. https://github.com/apache/spark/blame/a075cd5b700f88ef447b559c6411518136558d78/python/pyspark/sql/dataframe.py#L923 When I google IllegalArgumentException: 'Unsupported class file major

Re: Spark dataframe hdfs vs s3

2020-05-29 Thread randy clinton
n-aws-s3-by-10x-with-alluxio-tiered-storage/ > > > On Wed, May 27, 2020 at 6:52 PM Dark Crusader < > relinquisheddra...@gmail.com> wrote: > >> Hi Randy, >> >> Yes, I'm using parquet on both S3 and hdfs. >> >> On Thu, 28 May, 2020, 2:38 am randy clinton, &

Re: Spark dataframe hdfs vs s3

2020-05-28 Thread randy clinton
-5-reasons-for-choosing-s3-over-hdfs.html <https://databricks.com/blog/2017/05/31/top-5-reasons-for-choosing-s3-over-hdfs.html>* On Wed, May 27, 2020, 9:51 PM Dark Crusader wrote: > Hi Randy, > > Yes, I'm using parquet on both S3 and hdfs. > > On Thu, 28 May, 2020,

Re: Spark dataframe hdfs vs s3

2020-05-27 Thread randy clinton
Is the file Parquet on S3 or is it some other file format? In general I would assume that HDFS read/writes are more performant for spark jobs. For instance, consider how well partitioned your HDFS file is vs the S3 file. On Wed, May 27, 2020 at 1:51 PM Dark Crusader wrote: > Hi Jörn, > >

Re: Left Join at SQL query gets planned as inner join

2020-04-30 Thread randy clinton
Does it still plan an inner join if you remove a filter on both tables? It seems like you are asking for a left join, but your filters demand the behavior of an inner join. Maybe you could do the filters on the tables first and then join them. Something roughly like.. s_DF = s_DF.filter(year =