Testing spark e-mail list
Confidentiality: This communication and any attachments are intended for the above-named persons only and may be confidential and/or legally privileged. Any opinions expressed in this communication are not necessarily those of NICE Actimize. If this communication has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender by e-mail immediately. Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. Viruses: Although we have taken steps toward ensuring that this e-mail and attachments are free from any virus, we advise that in keeping with good computing practice the recipient should ensure they are actually virus free.
RE: Hive From Spark: Jdbc VS sparkContext
Testing Spark group e-mail Confidentiality: This communication and any attachments are intended for the above-named persons only and may be confidential and/or legally privileged. Any opinions expressed in this communication are not necessarily those of NICE Actimize. If this communication has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender by e-mail immediately. Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. Viruses: Although we have taken steps toward ensuring that this e-mail and attachments are free from any virus, we advise that in keeping with good computing practice the recipient should ensure they are actually virus free. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
spark sql truncate function
I saw that it is possible to truncate date function with MM or YY but it is not possible to truncate by WEEK ,HOUR, MINUTE. Am I right? Is there any objection to support it or it is just not implemented yet. Thanks David Confidentiality: This communication and any attachments are intended for the above-named persons only and may be confidential and/or legally privileged. Any opinions expressed in this communication are not necessarily those of NICE Actimize. If this communication has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender by e-mail immediately. Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. Viruses: Although we have taken steps toward ensuring that this e-mail and attachments are free from any virus, we advise that in keeping with good computing practice the recipient should ensure they are actually virus free.
Spark Storage Tab is empty
I have tried the following code but didn't see anything on the storage tab. val myrdd = sc.parallelilize(1 to 100) myrdd.setName("my_rdd") myrdd.cache() myrdd.collect() Storage tab is empty, though I can see the stage of collect() . I am using 1.6.2 ,HDP 2.5 , spark on yarn Thanks David Confidentiality: This communication and any attachments are intended for the above-named persons only and may be confidential and/or legally privileged. Any opinions expressed in this communication are not necessarily those of NICE Actimize. If this communication has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender by e-mail immediately. Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. Viruses: Although we have taken steps toward ensuring that this e-mail and attachments are free from any virus, we advise that in keeping with good computing practice the recipient should ensure they are actually virus free.
Re: SPARK -SQL Understanding BroadcastNestedLoopJoin and number of partitions
Do you know who can I talk to about this code? I am rally curious to know why there is a join and why number of partition for join is the sum of both of them, I expected to see that number of partitions should be the same as the streamed table ,or worst case multiplied. Sent from my iPhone On Dec 21, 2016, at 14:43, David Hodeffi mailto:david.hode...@niceactimize.com>> wrote: I have two dataframes which I am joining. small and big size dataframess. The optimizer suggest to use BroadcastNestedLoopJoin. number of partitions for the big Dataframe is 200 while small Dataframe has 5 partitions. The joined dataframe results with 205 partitions (joined.rdd.partitions.size), I have tried to understand why is this number and figured out that BroadCastNestedLoopJoin is actually a union. code : case class BroadcastNestedLoopJoin{ def doExecuteo(): = { ... ... sparkContext.union( matchedStreamRows, sparkContext.makeRDD(notMatchedBroadcastRows) ) } } can someone please explain what exactly the code of doExecute() do? can you elaborate about all the null checks and why can we have nulls ? Why do we have 205 partitions? link to a JIRA with discussion that can explain the code can help. Confidentiality: This communication and any attachments are intended for the above-named persons only and may be confidential and/or legally privileged. Any opinions expressed in this communication are not necessarily those of NICE Actimize. If this communication has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender by e-mail immediately. Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. Viruses: Although we have taken steps toward ensuring that this e-mail and attachments are free from any virus, we advise that in keeping with good computing practice the recipient should ensure they are actually virus free. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
SPARK -SQL Understanding BroadcastNestedLoopJoin and number of partitions
I have two dataframes which I am joining. small and big size dataframess. The optimizer suggest to use BroadcastNestedLoopJoin. number of partitions for the big Dataframe is 200 while small Dataframe has 5 partitions. The joined dataframe results with 205 partitions (joined.rdd.partitions.size), I have tried to understand why is this number and figured out that BroadCastNestedLoopJoin is actually a union. code : case class BroadcastNestedLoopJoin{ def doExecuteo(): = { ... ... sparkContext.union( matchedStreamRows, sparkContext.makeRDD(notMatchedBroadcastRows) ) } } can someone please explain what exactly the code of doExecute() do? can you elaborate about all the null checks and why can we have nulls ? Why do we have 205 partitions? link to a JIRA with discussion that can explain the code can help. Confidentiality: This communication and any attachments are intended for the above-named persons only and may be confidential and/or legally privileged. Any opinions expressed in this communication are not necessarily those of NICE Actimize. If this communication has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender by e-mail immediately. Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. Viruses: Although we have taken steps toward ensuring that this e-mail and attachments are free from any virus, we advise that in keeping with good computing practice the recipient should ensure they are actually virus free.
RE: Launching multiple spark jobs within a main spark job.
I am not familiar of any problem with that. Anyway, If you run spark applicaction you would have multiple jobs, which makes sense that it is not a problem. Thanks David. From: Naveen [mailto:hadoopst...@gmail.com] Sent: Wednesday, December 21, 2016 9:18 AM To: d...@spark.apache.org; user@spark.apache.org Subject: Launching multiple spark jobs within a main spark job. Hi Team, Is it ok to spawn multiple spark jobs within a main spark job, my main spark job's driver which was launched on yarn cluster, will do some preprocessing and based on it, it needs to launch multilple spark jobs on yarn cluster. Not sure if this right pattern. Please share your thoughts. Sample code i ve is as below for better understanding.. - Object Mainsparkjob { main(...){ val sc=new SparkContext(..) Fetch from hive..using hivecontext Fetch from hbase //spawning multiple Futures.. Val future1=Future{ Val sparkjob= SparkLauncher(...).launch; spark.waitFor } Similarly, future2 to futureN. future1.onComplete{...} } }// end of mainsparkjob -- Confidentiality: This communication and any attachments are intended for the above-named persons only and may be confidential and/or legally privileged. Any opinions expressed in this communication are not necessarily those of NICE Actimize. If this communication has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender by e-mail immediately. Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. Viruses: Although we have taken steps toward ensuring that this e-mail and attachments are free from any virus, we advise that in keeping with good computing practice the recipient should ensure they are actually virus free.