Re: FPGrowth Model is taking too long to generate frequent item sets

2017-03-14 Thread Raju Bairishetti
"machine learning feature engineering %my_programing_langauge%" >> >> On Tue, Mar 7, 2017 at 3:39 AM, Raju Bairishetti <r...@apache.org> wrote: >> >>> @Eli, Thanks for the suggestion. If you do not mind can you please >>> elaborate approaches?

Re: FPGrowth Model is taking too long to generate frequent item sets

2017-03-06 Thread Raju Bairishetti
d luck > > On Mon, Mar 6, 2017 at 6:56 AM, Raju Bairishetti <r...@apache.org> wrote: > >> Hi, >> I am new to Spark ML Lib. I am using FPGrowth model for finding related >> items. >> >> Number of transactions are 63K and the total number of items in all &g

FPGrowth Model is taking too long to generate frequent item sets

2017-03-05 Thread Raju Bairishetti
please guide me how to reduce the execution time for generating frequent items? -- Thanks, Raju Bairishetti, www.lazada.com

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti
6910 and corresponding all PRs. *spark.sql.hive.convertMetastoreParquet false* *spark.sql.hive.metastorePartitionPruning true* *I had set the above properties from *SPARK-6910 & PRs. > > Yong > > > -- > *From:* Raju Bairishetti <r...@

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti
] = client.getPartitionsByFilter(this, predicates) lazy val allPartitions = table.getAllPartitions But somehow getAllPartitions is getting called eventough after setting metastorePartitionPruning to true. Am I missing something or looking at wrong place? On Mon, Jan 16, 2017 at 12:53 PM, Raju Bairishetti &l

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-15 Thread Raju Bairishetti
Waiting for suggestions/help on this... On Wed, Jan 11, 2017 at 12:14 PM, Raju Bairishetti <r...@apache.org> wrote: > Hello, > >Spark sql is generating query plan with all partitions information even > though if we apply filters on partitions in the query. Due to this, s

Re: Spark Log information

2017-01-15 Thread Raju Bairishetti
one tell > what does this number indicate in the below case. > > [Stage 2:>(44 + 48) / > 21428] > > 44+28 and 21428. > > Thanks, > Asmath > > -- -- Thanks, Raju Bairishetti, www.lazada.com

Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-10 Thread Raju Bairishetti
ause of changing serde from spark-builtin to hive serde. I feel like,* fixing query plan generation in the spark-sql* is the right approach instead of forcing users to use hive serde. Is there any workaround/way to fix this issue? I would like to hear more thoughts on this :) -- Thanks,

Re: Spark Dataframe: Save to hdfs is taking long time

2016-12-28 Thread Raju Bairishetti
records for this dataframe is >> 4903764 >> >> I even increased number of partitions from 10 to 20, still no luck. Can >> anyone help me in resolving this performance issue >> >> Thanks, >> >> Asmath >> >> > -- -- Thanks, Raju Bairishetti, www.lazada.com

Re: How to run hive queries in async mode using spark sql

2016-05-24 Thread Raju Bairishetti
> > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 24 May 2016 at 11:

Re: How to run hive queries in async mode using spark sql

2016-05-24 Thread Raju Bairishetti
Bairishetti <raju@gmail.com> wrote: > I am using spark sql for running hive queries also. Is there any way to > run hive queries in asyc mode using spark sql. > > Does it return any hive handle or if yes how to get the results from hive > handle using spark sql? > > -- &

How to run hive queries in async mode using spark sql

2016-05-17 Thread Raju Bairishetti
I am using spark sql for running hive queries also. Is there any way to run hive queries in asyc mode using spark sql. Does it return any hive handle or if yes how to get the results from hive handle using spark sql? -- Thanks, Raju Bairishetti, www.lazada.com

Re: [Streaming-Kafka] How to start from topic offset when streamcontext is using checkpoint

2016-01-24 Thread Raju Bairishetti
iately. Alternately what we do sometimes is- > > 1. Maintain couple of iterations for some 30-40 seconds in application > until we have substantial data and then we write them to disk. > 2. Push smaller data back to kafka, and a different job handles the save > to disk. > > On S

Re: [Streaming-Kafka] How to start from topic offset when streamcontext is using checkpoint

2016-01-23 Thread Raju Bairishetti
thub.com/koeninger/kafka-exactly-once > > Regarding the small files problem, either don't use HDFS, or use something > like filecrush for merging. > > On Fri, Jan 22, 2016 at 3:03 AM, Raju Bairishetti <r...@apache.org> wrote: > >> Hi, >> >> >>I am very

[Streaming-Kafka] How to start from topic offset when streamcontext is using checkpoint

2016-01-22 Thread Raju Bairishetti
r of small files in HDFS. Having small files in HDFS will leads to lots of other issues. Is there any way to write multiple RDDs into single file? Don't have muh idea about *coalesce* usage. In the worst case, I can merge all small files in HDFS in regular intervals. Thanks... -- Thanks Raju Bairishetti www.lazada.com