Re: K8s-Spark client mode : Executor image not able to download application jar from driver

2019-04-27 Thread Nikhil Chinnapa
Hi Stavros,

Thanks a lot for pointing in right direction. I got stuck in some release,
so didn’t got time earlier.

The mistake was “LINUX_APP_RESOURCE” : I was using “local” instead it should
be “file”. I reached above due to your email only.

What I understood:
Driver image :  $SPARK_HOME/bin and $SPARK_HOME/jars and application jar.
Executor Image : just $SPARK_HOME/bin and $SPARK_HOME/jars folder will
suffice.




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Spark SQL met "Block broadcast_xxx not found"

2019-04-27 Thread Xilang Yan
We met broadcast issue in some of our applications, but not every time we run
application, usually it gone when we rerun it. In the exception log, I see
below two types of exception:

Exception 1:
10:09:20.295 [shuffle-server-6-2] ERROR
org.apache.spark.network.server.TransportRequestHandler - Error opening
block StreamChunkId{streamId=365584526097, chunkIndex=0} for request from
/10.33.46.33:19866
org.apache.spark.storage.BlockNotFoundException: Block broadcast_334_piece0
not found
at
org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:361)
~[spark-core_2.11-2.2.1.jar:2.2.1]
at
org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$1.apply(NettyBlockRpcServer.scala:61)
~[spark-core_2.11-2.2.1.jar:2.2.1]
at
org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$1.apply(NettyBlockRpcServer.scala:60)
~[spark-core_2.11-2.2.1.jar:2.2.1]
at scala.collection.Iterator$$anon$11.next(Iterator.scala:363)
~[scala-library-2.11.0.jar:?]
at
scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:31)
~[scala-library-2.11.0.jar:?]
at
org.apache.spark.network.server.OneForOneStreamManager.getChunk(OneForOneStreamManager.java:87)
~[spark-network-common_2.11-2.2.1.jar:2.2.1]
at
org.apache.spark.network.server.TransportRequestHandler.processFetchRequest(TransportRequestHandler.java:125)
[spark-network-common_2.11-2.2.1.jar:2.2.1]
at
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:103)
[spark-network-common_2.11-2.2.1.jar:2.2.1]
at
org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)
[spark-network-common_2.11-2.2.1.jar:2.2.1]
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
[netty-all-4.0.23.Final.jar:4.0.23.Final]
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
[netty-all-4.0.23.Final.jar:4.0.23.Final]


Exception 2:
10:14:37.906 [Executor task launch worker for task 430478] ERROR
org.apache.spark.util.Utils - Exception encountered
org.apache.spark.SparkException: Failed to get broadcast_696_piece0 of
broadcast_696
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:178)
~[spark-core_2.11-2.2.1.jar:2.2.1]
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:150)
~[spark-core_2.11-2.2.1.jar:2.2.1]
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:150)
~[spark-core_2.11-2.2.1.jar:2.2.1]
at scala.collection.immutable.List.foreach(List.scala:383)
~[scala-library-2.11.0.jar:?]
at
org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:150)
~[spark-core_2.11-2.2.1.jar:2.2.1]
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:222)
~[spark-core_2.11-2.2.1.jar:2.2.1]
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303)
[spark-core_2.11-2.2.1.jar:2.2.1]
at
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:206)
[spark-core_2.11-2.2.1.jar:2.2.1]
at
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
[spark-core_2.11-2.2.1.jar:2.2.1]
at
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
[spark-core_2.11-2.2.1.jar:2.2.1]
at
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
[spark-core_2.11-2.2.1.jar:2.2.1]
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
[spark-core_2.11-2.2.1.jar:2.2.1]


I think exception 2 is caused by exception 1, so the issue is when executor
A try to get broadcast from executor B, executor B cannot find in local. It
is strange, because broadcast is store in memory and disk, it should not be
removed only when driver asked, but driver will remove broadcast only one
broadcast variable not used anymore.

Could anyone gives some cue on how to find the root cause of this issue?
Thanks a lot!






--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



This MapR-DB Spark Connector with Secondary Indexes

2019-04-27 Thread Mich Talebzadeh
First as I understand MapR-DB is a proprietary (not open source) NOSQL
database that MapR offers. Similar to Hbase but better performance. There
are some speculative statement as below:

https://hackernoon.com/mapr-db-spark-connector-with-secondary-indexes-df41909f28ea


"MapR Data Platform offers significant advantages over any other tool on
the big data space. MapR-DB is one of the core components of the platform
and it offers state of the art capabilities that blow away most of the
NoSQL databases out there"

OK Spark has connectors for Hbase, Aerospike, Mongo etc. So no surprise
here. However, as I understand within Map-R DB one can create secondary
indexes and Spark can take advantages of these filters to reduce the load
into RDD.

val schema = StructType(Seq(StructField("_id", StringType),
StructField("uid", StringType)))

val data = sparkSession
  .loadFromMapRDB("/user/mapr/tables/data", schema)
  .filter("uid = '101'")
  .select("_id")
So apparently this load will be more efficient as long as the secondary
indexes are created in Map-R on the filtering column.

Also see this doc

https://mapr.com/docs/51/MapROverview/c_maprdb_new.html

Sounds like MapR-DB tries to be a third part version of HBase and in some
way mimics HDFS as well. I just don't understand when one can use Apache
Phoenix with secondary indexes on Hbase that provide a relational view of
Hbase.

Has anyone used this product?

There is some reference here as well

https://stackoverflow.com/questions/30254134/difference-between-mapr-db-and-hbase

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.


Re: unsubscribe

2019-04-27 Thread Song Yang
>
> unsubscribe
>