"machine learning feature engineering %my_programing_langauge%"
>>
>> On Tue, Mar 7, 2017 at 3:39 AM, Raju Bairishetti <r...@apache.org> wrote:
>>
>>> @Eli, Thanks for the suggestion. If you do not mind can you please
>>> elaborate approaches?
d luck
>
> On Mon, Mar 6, 2017 at 6:56 AM, Raju Bairishetti <r...@apache.org> wrote:
>
>> Hi,
>> I am new to Spark ML Lib. I am using FPGrowth model for finding related
>> items.
>>
>> Number of transactions are 63K and the total number of items in all
&g
please guide me how to reduce the execution time for
generating frequent items?
--
Thanks,
Raju Bairishetti,
www.lazada.com
6910 and corresponding all PRs.
*spark.sql.hive.convertMetastoreParquet false*
*spark.sql.hive.metastorePartitionPruning true*
*I had set the above properties from *SPARK-6910 & PRs.
>
> Yong
>
>
> --
> *From:* Raju Bairishetti <r...@
] =
client.getPartitionsByFilter(this, predicates)
lazy val allPartitions = table.getAllPartitions
But somehow getAllPartitions is getting called eventough after setting
metastorePartitionPruning to true.
Am I missing something or looking at wrong place?
On Mon, Jan 16, 2017 at 12:53 PM, Raju Bairishetti &l
Waiting for suggestions/help on this...
On Wed, Jan 11, 2017 at 12:14 PM, Raju Bairishetti <r...@apache.org> wrote:
> Hello,
>
>Spark sql is generating query plan with all partitions information even
> though if we apply filters on partitions in the query. Due to this, s
one tell
> what does this number indicate in the below case.
>
> [Stage 2:>(44 + 48) /
> 21428]
>
> 44+28 and 21428.
>
> Thanks,
> Asmath
>
>
--
--
Thanks,
Raju Bairishetti,
www.lazada.com
ause of changing
serde from spark-builtin to hive serde.
I feel like,* fixing query plan generation in the spark-sql* is the right
approach instead of forcing users to use hive serde.
Is there any workaround/way to fix this issue? I would like to hear more
thoughts on this :)
--
Thanks,
records for this dataframe is
>> 4903764
>>
>> I even increased number of partitions from 10 to 20, still no luck. Can
>> anyone help me in resolving this performance issue
>>
>> Thanks,
>>
>> Asmath
>>
>>
>
--
--
Thanks,
Raju Bairishetti,
www.lazada.com
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 24 May 2016 at 11:
Bairishetti <raju@gmail.com>
wrote:
> I am using spark sql for running hive queries also. Is there any way to
> run hive queries in asyc mode using spark sql.
>
> Does it return any hive handle or if yes how to get the results from hive
> handle using spark sql?
>
> --
&
I am using spark sql for running hive queries also. Is there any way to run
hive queries in asyc mode using spark sql.
Does it return any hive handle or if yes how to get the results from hive
handle using spark sql?
--
Thanks,
Raju Bairishetti,
www.lazada.com
iately. Alternately what we do sometimes is-
>
> 1. Maintain couple of iterations for some 30-40 seconds in application
> until we have substantial data and then we write them to disk.
> 2. Push smaller data back to kafka, and a different job handles the save
> to disk.
>
> On S
thub.com/koeninger/kafka-exactly-once
>
> Regarding the small files problem, either don't use HDFS, or use something
> like filecrush for merging.
>
> On Fri, Jan 22, 2016 at 3:03 AM, Raju Bairishetti <r...@apache.org> wrote:
>
>> Hi,
>>
>>
>>I am very
r of small files in HDFS. Having
small files in HDFS will leads to lots of other issues.
Is there any way to write multiple RDDs into single file? Don't have muh
idea about *coalesce* usage. In the worst case, I can merge all small files
in HDFS in regular intervals.
Thanks...
--
Thanks
Raju Bairishetti
www.lazada.com
15 matches
Mail list logo