Re: A simple comparison for three SQL engines

2022-04-09 Thread Wes Peng
may I forward this report to spark list as well. Thanks. Wes Peng wrote: Hello, This weekend I made a test against a big dataset. spark, drill, mysql, postgresql were involved. This is the final report: https://blog.cloudcache.net/handles-the-file-larger-than-memory/ The simple conclusion

Re: Executorlost failure

2022-04-07 Thread Wes Peng
I just did a test, even for a single node (local deployment), spark can handle the data whose size is much larger than the total memory. My test VM (2g ram, 2 cores): $ free -m totalusedfree shared buff/cache available Mem: 19921845

Re: Executorlost failure

2022-04-07 Thread Wes Peng
I once had a file which is 100+GB getting computed in 3 nodes, each node has 24GB memory only. And the job could be done well. So from my experience spark cluster seems to work correctly for big files larger than memory by swapping them to disk. Thanks rajat kumar wrote: Tested this with

Re: Executorlost failure

2022-04-07 Thread Wes Peng
how many executors do you have? rajat kumar wrote: Tested this with executors of size 5 cores, 17GB memory. Data vol is really high around 1TB - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

query time comparison to several SQL engines

2022-04-07 Thread Wes Peng
I made a simple test to query time for several SQL engines including mysql, hive, drill and spark. The report, https://cloudcache.net/data/query-time-mysql-hive-drill-spark.pdf It maybe have no special meaning, just for fun. :) regards.

Re: Profiling spark application

2022-01-19 Thread Wes Peng
Give a look at this: https://github.com/LucaCanali/sparkMeasure On 2022/1/20 1:18, Prasad Bhalerao wrote: Is there any way we can profile spark applications which will show no. of invocations of spark api and their execution time etc etc just the way jprofiler shows all the details?

Re: [Pyspark] How to download Zip file from SFTP location and put in into Azure Data Lake and unzip it

2022-01-18 Thread Wes Peng
How large is the file? From my experience, reading the excel file from data lake and loading as dataframe, works great. Thanks On 2022-01-18 22:16, Heta Desai wrote: Hello, I have zip files on SFTP location. I want to download/copy those files and put into Azure Data Lake. Once the zip

Re: ivy unit test case filing for Spark

2021-12-21 Thread Wes Peng
Are you using IvyVPN which causes this problem? If the VPN software changes the network URL silently you should avoid using them. Regards. On Wed, Dec 22, 2021 at 1:48 AM Pralabh Kumar wrote: > Hi Spark Team > > I am building a spark in VPN . But the unit test case below is failing. > This is