Unsubscribe

2021-09-03 Thread Yuri Oleynikov (‫יורי אולייניקוב‬‎)
Unsubscribe

Re: JavaSerializerInstance is slow

2021-09-03 Thread Sean Owen
I don't know if java serialization is slow in that case; that shows
blocking on a class load, which may or may not be directly due to
deserialization.
Indeed I don't think (some) things are serialized in local mode within one
JVM, so not sure that's actually what's going on.

On Thu, Sep 2, 2021 at 11:58 PM Antonin Delpeuch (lists) <
li...@antonin.delpeuch.eu> wrote:

> Hi Kohki,
>
> Serialization of tasks happens in local mode too and as far as I am
> aware there is no way to disable this (although it would definitely be
> useful in my opinion).
>
> You can see the local mode as a testing mode, in which you would want to
> catch any serialization errors, before they appear in production.
>
> There are also some important bugs that are present in local mode and
> are not deemed worth fixing because it is not intended to be used in
> production (https://issues.apache.org/jira/browse/SPARK-5300).
>
> I think there would definitely be interest in having a reliable and
> efficient local mode in Spark but it's a pretty different use case than
> what Spark originally focused on.
>
> Antonin
>
> On 03/09/2021 05:56, Kohki Nishio wrote:
> > I'm seeing many threads doing deserialization of a task, I understand
> > since lambda is involved, we can't use Kryo for those purposes.
> > However I'm running it in local mode, this serialization is not really
> > necessary, no?
> >
> > Is there any trick I can apply to get rid of this thread contention ?
> > I'm seeing many of the below threads in thread dumps ...
> >
> >
> > "Executor task launch worker for task 11.0 in stage 15472514.0 (TID
> > 19788863)" #732821 daemon prio=5 os_prio=0 tid=0x7f02581b2800
> > nid=0x355d waiting for monitor entry [0x7effd1e3f000]
> >java.lang.Thread.State: BLOCKED (on object monitor)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:400)
> > - waiting to lock <0x7f0f7246edf8> (a java.lang.Object)
> > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> > at
> >
> scala.runtime.LambdaDeserializer$.deserializeLambda(LambdaDeserializer.scala:51)
> > at
> >
> scala.runtime.LambdaDeserialize.deserializeLambda(LambdaDeserialize.java:38)
> >
> >
> > Thanks
> > -Kohki
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


ApacheCon is just 3 weeks away!

2021-09-03 Thread Rich Bowen
[You are receiving this email because you are subscribed to the user 
list of one or more Apache project.]


Dear Apache enthusiast,

ApacheCon is our annual convention, featuring content related to our 
many software projects. This year, it will be held on September 21-23.


Registration is free this year, and since it’s online, you can attend 
from the comfort of your home or office.


Details about the event, including the schedule and registration, are 
available on the event website at https://apachecon.com/acah2021/


We hope you’ll consider attending this year, where you’ll see content in 
14 tracks, including: API & Microservice; Big Data: Ozone; Big Data: 
SQL/NoSQL; Big Data: Streaming; Cassandra; Community; Content Delivery; 
Content Management; Federated Data; Fineract & Fintech; Geospatial; 
Groovy; Highlights; Incubator; Integration; Internet of Things; 
Observability; Search; Tomcat.


We will also feature keynotes from Ashley Wolf, Mark Cox, Alison Parker 
and Michael Weinberg.


Details on the schedule, and these keynotes, can be found at 
https://www.apachecon.com/acah2021/tracks/


We look forward to seeing you at this year’s ApacheCon!

– Rich Bowen, for the Apachecon Planners

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Unsubscribe

2021-09-03 Thread Yuri Oleynikov (‫יורי אולייניקוב‬‎)



-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



spark streaming to jdbc

2021-09-03 Thread igyu
val lines = spark.readStream
  .format("socket")
  //  .schema(StructType(schemas))
  .option("host", "10.3.87.23")
  .option("port", )
  .load()
  .selectExpr("CAST(value AS STRING)").as[(String)]DF = lines.map(x => {
  val obj = JSON.parseObject(x)
  val ls = new util.ArrayList()
  (obj.getString("timestamp"), obj.getString("msg"))
}).toDF("timestamp", "msg")
val q = DF.writeStream
  .foreachBatch((row, l) => {
if (row.count() != 0) {
  row.write.format("jdbc")
.mode(mode)
.options(cfg)
.save()
}

  })
  .start()

I get a error

Logical Plan:
Project [_1#10 AS timestamp#13, _2#11 AS msg#14]
+- SerializeFromObject [staticinvoke(class 
org.apache.spark.unsafe.types.UTF8String, StringType, fromString, 
assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1, true, false) AS 
_1#10, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
fromString, assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2, 
true, false) AS _2#11]
   +- MapElements , class java.lang.String, 
[StructField(value,StringType,true)], obj#9: scala.Tuple2
  +- DeserializeToObject cast(value#2 as string).toString, obj#8: 
java.lang.String
 +- Project [cast(value#0 as string) AS value#2]
+- StreamingExecutionRelation TextSocketV2[host: 10.3.87.23, port: 
], [value#0]

at 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
Caused by: java.lang.NullPointerException
at java.util.regex.Matcher.getTextLength(Matcher.java:1283)
at java.util.regex.Matcher.reset(Matcher.java:309)
at java.util.regex.Matcher.(Matcher.java:229)
at java.util.regex.Pattern.matcher(Pattern.java:1093)
at scala.util.matching.Regex.findFirstIn(Regex.scala:388)
at 
org.apache.spark.util.Utils$$anonfun$redact$1$$anonfun$apply$15.apply(Utils.scala:2695)
at 
org.apache.spark.util.Utils$$anonfun$redact$1$$anonfun$apply$15.apply(Utils.scala:2695)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.util.Utils$$anonfun$redact$1.apply(Utils.scala:2695)
at org.apache.spark.util.Utils$$anonfun$redact$1.apply(Utils.scala:2693)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.util.Utils$.redact(Utils.scala:2693)
at org.apache.spark.util.Utils$.redact(Utils.scala:2660)
at 
org.apache.spark.sql.internal.SQLConf$$anonfun$redactOptions$1.apply(SQLConf.scala:2071)
at 
org.apache.spark.sql.internal.SQLConf$$anonfun$redactOptions$1.apply(SQLConf.scala:2071)
at 
scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at org.apache.spark.sql.internal.SQLConf.redactOptions(SQLConf.scala:2071)
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.simpleString(SaveIntoDataSourceCommand.scala:52)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.verboseString(QueryPlan.scala:177)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:548)
at org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:472)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$4.apply(QueryExecution.scala:197)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$4.apply(QueryExecution.scala:197)
at 
org.apache.spark.sql.execution.QueryExecution.stringOrError(QueryExecution.scala:99)
at 
org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:197)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:75)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
at com.join.jdbc.writer.JdbcWriter$$anonfun$1.apply(JdbcWriter.scala:46)
at com.join.jdbc.writer.JdbcWriter$$anonfun$1.apply(JdbcWriter.scala:41)
at 
org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:35)
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5$$anonfun$apply$17.apply(MicroBatchExecution.scala:534)
at