Hi All,
What determines the batch size while reading from a file from HDFS?
I am trying to read files from HDFS and ingest into Kafka using Spark
Structured Streaming 2.3.1. I get an error sayiKafkafka batch size is too
big and that I need to increase max.request.size. Sure I can increase it
but
Hello,
I using 2.4 , it works
scala> val df_A=Seq(("1",
10.0),("2",20.0),("3",30.0),("4",40.0),("5",50.0),("6",60.0),("7",70.0),("8",80.0),("9",90.0),("10",10.0)).toDF("id","val");
df_A: org.apache.spark.sql.DataFrame = [id: string, val: double]
scala> val df_B=Seq(("11",
Hi Lian,
Since you using repartition(1), do you want to decrease the number of
partitions? If so, have you tried to use coalesce instead?
Kathleen
On Fri, Mar 22, 2019 at 2:43 PM Lian Jiang wrote:
> Hi,
>
> Writing a csv to HDFS takes about 1 hour:
>
>
>
Is it also slow when you do not repartition? (i.e., to get multiple
output files)
Also did you try simply saveAsTextFile?
Also, before repartition, how many partitions are there?
a.
On 22/3/19 23:34, Lian Jiang wrote:
Hi,
Writing a csv to HDFS takes about 1 hour:
Hi,
Writing a csv to HDFS takes about 1 hour:
df.repartition(1).write.format('com.databricks.spark.csv').mode('overwrite').options(header='true').save(csv)
The generated csv file is only about 150kb. The job uses 3 containers (13
cores, 23g mem).
Other people have similar issues but I don't
What is the size of your data, size of the cluster, are you using
spark-submit or an IDE, what spark version are you using?
Try spark-submit and increase the memory of the driver or the executors.
a.
On 22/3/19 17:19, KhajaAsmath Mohammed wrote:
Hi,
I am getting the below exception when
Try `val enconder = RowEncoder(df.schema).resolveAndBind()` ?
On Thu, Mar 21, 2019 at 5:39 PM Long, Andrew
wrote:
> Thanks a ton for the help!
>
>
>
> Is there a standardized way of converting the internal row to rows?
>
>
>
> I’ve tried this but im getting an exception
>
>
>
> *val *enconder =
Did you include the whole error message?
On Fri, Mar 22, 2019 at 12:45 AM 563280193 <563280...@qq.com> wrote:
> Hi ,
> I ran a spark sql like this:
>
> *select imei,tag, product_id,*
> * sum(case when succ1>=1 then 1 else 0 end) as succ,*
> * sum(case when fail1>=1 and succ1=0 then 1
Hi,
I am getting the below exception when using Spark Kmeans. Any solutions
from the experts. Would be really helpful.
val kMeans = new KMeans().setK(reductionCount).setMaxIter(30)
val kMeansModel = kMeans.fit(df)
Error is occured when calling kmeans.fit
Exception in thread "main"
Hi,
It seems that there is a syntax error in your SQL. You should use `select
... from ... group by ... having sum(...)=0 and sum(...)>0` instead of
`where sum(...)=0...`. I think aggregate functions are not allowed in WHERE.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
Hi ,
I ran a spark sql like this:
select imei,tag, product_id,
sum(case when succ1>=1 then 1 else 0 end) as succ,
sum(case when fail1>=1 and succ1=0 then 1 else 0 end) as fail,
count(*) as cnt
from t_tbl
where sum(case when succ1>=1 then 1 else 0 end)=0
Hi ,
I ran a spark sql like this:
select imei,tag, product_id,
sum(case when succ1>=1 then 1 else 0 end) as succ,
sum(case when fail1>=1 and succ1=0 then 1 else 0 end) as fail,
count(*) as cnt
from t_tbl
where sum(case when succ1>=1 then 1 else 0 end)=0 and sum(case when
12 matches
Mail list logo