Spark Mllib logistic regression setWeightCol illegal argument exception

2020-01-09 Thread Patrick
Hi Spark Users, I am trying to solve a class imbalance problem, I figured out, spark supports setting weight in its API but I get IIlegal Argument exception weight column do not exist, but it do exists in the dataset. Any recommedation to go about this problem ? I am using Pipeline API with

Re: Merge multiple different s3 logs using pyspark 2.4.3

2020-01-09 Thread Gourav Sengupta
Hi Shraddha, what is interesting to me that people do not even have the courtesy to write their name when they request for help to user groups :) your solution is spot on, there is another option available in spark SQL though for this. Regards, Gourav Sengupta On Thu, Jan 9, 2020 at 1:19 PM

Re: Merge multiple different s3 logs using pyspark 2.4.3

2020-01-09 Thread Shraddha Shah
Unless I am reading this wrong, this can be achieved with aws sync ? aws s3 sync s3://my-bucket/ingestion/source1/y=2019/m=12/d=12 s3://my-bucket/ingestion/processed/ *src_category=other*/y=2019/m=12/d=12 Thanks, -Shraddha On Thu, Jan 9, 2020 at 7:05 AM Gourav Sengupta wrote: > why s3a? > >

Re: Merge multiple different s3 logs using pyspark 2.4.3

2020-01-09 Thread Gourav Sengupta
why s3a? On Thu, Jan 9, 2020 at 2:20 AM anbutech wrote: > Hello, > > version = spark 2.4.3 > > I have 3 different sources json logs data which having same schema(same > columns order) in the raw data and want to add one new column as > "src_category" for all the 3 different source to