Adding OpenSearch as a secondary index provider to SparkSQL

2023-03-24 Thread Anirudha Jadhav
Hello community, wanted your opinion on this implementation demo.

/ support for Materialized views, skipping indices and covered indices with
bloom filter optimizations with opensearch via SparkSQL

https://github.com/opensearch-project/sql/discussions/1465
( see video with voice over )

Ani
-- 
Anirudha P. Jadhav


Re: spark worker on mesos slave | possible networking config issue

2015-03-25 Thread Anirudha Jadhav
is there a way to have this dynamically pick the local IP.

static assignment does not work cos the workers  are dynamically allocated
on mesos

On Wed, Mar 25, 2015 at 3:04 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 It says:
 ried to associate with unreachable remote address 
 [akka.tcp://sparkDriver@localhost:51849].
 Address is now gated for 5000 ms, all messages to this address will be
 delivered to dead letters. Reason: Connection refused: localhost/
 127.0.0.1:51849

 I'd suggest you changing this property:
 export SPARK_LOCAL_IP=127.0.0.1

 Point it to your network address like 192.168.1.10

 Thanks
 Best Regards

 On Tue, Mar 24, 2015 at 11:18 PM, Anirudha Jadhav aniru...@nyu.edu
 wrote:

 is there some setting i am missing:
 this is my spark-env.sh

 export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so
 export SPARK_EXECUTOR_URI=http://100.125.5.93/sparkx.tgz
 export SPARK_LOCAL_IP=127.0.0.1



 here is what i see on the slave node.
 
 less
 20150226-160708-78932-5050-8971-S0/frameworks/20150323-205508-78932-5050-29804-0012/executors/20150226-160708-78932-5050-8971-S0/runs/cceea834-c4d9-49d6-a579-8352f1889b56/stderr
 

 WARNING: Logging before InitGoogleLogging() is written to STDERR
 I0324 02:30:29.389225 27755 fetcher.cpp:76] Fetching URI '
 http://100.125.5.93/sparkx.tgz'
 I0324 02:30:29.389361 27755 fetcher.cpp:126] Downloading '
 http://100.125.5.93/sparkx.tgz' to
 '/tmp/mesos/slaves/20150226-160708-78932-5050-8971-S0/frameworks/20150323-205508-78932-5050-29804-0012/executors/20150226-160708-78932-5050-8971-S0/runs/cceea834-c4d9-49d6-a579-8352f1889b56/sparkx.tgz'
 I0324 02:30:35.353446 27755 fetcher.cpp:64] Extracted resource
 '/tmp/mesos/slaves/20150226-160708-78932-5050-8971-S0/frameworks/20150323-205508-78932-5050-29804-0012/executors/20150226-160708-78932-5050-8971-S0/runs/cceea834-c4d9-49d6-a579-8352f1889b56/sparkx.tgz'
 into
 '/tmp/mesos/slaves/20150226-160708-78932-5050-8971-S0/frameworks/20150323-205508-78932-5050-29804-0012/executors/20150226-160708-78932-5050-8971-S0/runs/cceea834-c4d9-49d6-a579-8352f1889b56'
 Spark assembly has been built with Hive, including Datanucleus jars on
 classpath
 Using Spark's default log4j profile:
 org/apache/spark/log4j-defaults.properties
 15/03/24 02:30:37 INFO MesosExecutorBackend: Registered signal handlers
 for [TERM, HUP, INT]
 I0324 02:30:37.071077 27863 exec.cpp:132] Version: 0.21.1
 I0324 02:30:37.080971 27885 exec.cpp:206] Executor registered on slave
 20150226-160708-78932-5050-8971-S0
 15/03/24 02:30:37 INFO MesosExecutorBackend: Registered with Mesos as
 executor ID 20150226-160708-78932-5050-8971-S0 with 1 cpus
 15/03/24 02:30:37 INFO SecurityManager: Changing view acls to: ubuntu
 15/03/24 02:30:37 INFO SecurityManager: Changing modify acls to: ubuntu
 15/03/24 02:30:37 INFO SecurityManager: SecurityManager: authentication
 disabled; ui acls disabled; users with view permissions: Set(ubuntu); users
 with modify permissions: Set(ubuntu)
 15/03/24 02:30:37 INFO Slf4jLogger: Slf4jLogger started
 15/03/24 02:30:37 INFO Remoting: Starting remoting
 15/03/24 02:30:38 INFO Remoting: Remoting started; listening on addresses
 :[akka.tcp://sparkExecutor@mesos-si2:50542]
 15/03/24 02:30:38 INFO Utils: Successfully started service
 'sparkExecutor' on port 50542.
 15/03/24 02:30:38 INFO AkkaUtils: Connecting to MapOutputTracker:
 akka.tcp://sparkDriver@localhost:51849/user/MapOutputTracker
 15/03/24 02:30:38 WARN Remoting: Tried to associate with unreachable
 remote address [akka.tcp://sparkDriver@localhost:51849]. Address is now
 gated for 5000 ms, all messages to this address will be delivered to dead
 letters. Reason: Connection refused: localhost/127.0.0.1:51849
 akka.actor.ActorNotFound: Actor not found for:
 ActorSelection[Anchor(akka.tcp://sparkDriver@localhost:51849/),
 Path(/user/MapOutputTracker)]
 at
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
 at
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
 at
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
 at
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
 at
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at
 scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
 at
 akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
 at
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
 at
 akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
 at
 akka.dispatch.ExecutionContexts

spark worker on mesos slave | possible networking config issue

2015-03-24 Thread Anirudha Jadhav
is there some setting i am missing:
this is my spark-env.sh

export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so
export SPARK_EXECUTOR_URI=http://100.125.5.93/sparkx.tgz
export SPARK_LOCAL_IP=127.0.0.1



here is what i see on the slave node.

less
20150226-160708-78932-5050-8971-S0/frameworks/20150323-205508-78932-5050-29804-0012/executors/20150226-160708-78932-5050-8971-S0/runs/cceea834-c4d9-49d6-a579-8352f1889b56/stderr


WARNING: Logging before InitGoogleLogging() is written to STDERR
I0324 02:30:29.389225 27755 fetcher.cpp:76] Fetching URI '
http://100.125.5.93/sparkx.tgz'
I0324 02:30:29.389361 27755 fetcher.cpp:126] Downloading '
http://100.125.5.93/sparkx.tgz' to
'/tmp/mesos/slaves/20150226-160708-78932-5050-8971-S0/frameworks/20150323-205508-78932-5050-29804-0012/executors/20150226-160708-78932-5050-8971-S0/runs/cceea834-c4d9-49d6-a579-8352f1889b56/sparkx.tgz'
I0324 02:30:35.353446 27755 fetcher.cpp:64] Extracted resource
'/tmp/mesos/slaves/20150226-160708-78932-5050-8971-S0/frameworks/20150323-205508-78932-5050-29804-0012/executors/20150226-160708-78932-5050-8971-S0/runs/cceea834-c4d9-49d6-a579-8352f1889b56/sparkx.tgz'
into
'/tmp/mesos/slaves/20150226-160708-78932-5050-8971-S0/frameworks/20150323-205508-78932-5050-29804-0012/executors/20150226-160708-78932-5050-8971-S0/runs/cceea834-c4d9-49d6-a579-8352f1889b56'
Spark assembly has been built with Hive, including Datanucleus jars on
classpath
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
15/03/24 02:30:37 INFO MesosExecutorBackend: Registered signal handlers for
[TERM, HUP, INT]
I0324 02:30:37.071077 27863 exec.cpp:132] Version: 0.21.1
I0324 02:30:37.080971 27885 exec.cpp:206] Executor registered on slave
20150226-160708-78932-5050-8971-S0
15/03/24 02:30:37 INFO MesosExecutorBackend: Registered with Mesos as
executor ID 20150226-160708-78932-5050-8971-S0 with 1 cpus
15/03/24 02:30:37 INFO SecurityManager: Changing view acls to: ubuntu
15/03/24 02:30:37 INFO SecurityManager: Changing modify acls to: ubuntu
15/03/24 02:30:37 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(ubuntu); users
with modify permissions: Set(ubuntu)
15/03/24 02:30:37 INFO Slf4jLogger: Slf4jLogger started
15/03/24 02:30:37 INFO Remoting: Starting remoting
15/03/24 02:30:38 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkExecutor@mesos-si2:50542]
15/03/24 02:30:38 INFO Utils: Successfully started service 'sparkExecutor'
on port 50542.
15/03/24 02:30:38 INFO AkkaUtils: Connecting to MapOutputTracker:
akka.tcp://sparkDriver@localhost:51849/user/MapOutputTracker
15/03/24 02:30:38 WARN Remoting: Tried to associate with unreachable remote
address [akka.tcp://sparkDriver@localhost:51849]. Address is now gated for
5000 ms, all messages to this address will be delivered to dead letters.
Reason: Connection refused: localhost/127.0.0.1:51849
akka.actor.ActorNotFound: Actor not found for:
ActorSelection[Anchor(akka.tcp://sparkDriver@localhost:51849/),
Path(/user/MapOutputTracker)]
at
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
at
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
at
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
at
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at
akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
at
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
at
akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
at
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
at
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:508)
at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:541)
at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:531)
at
akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87)


newbie quesiton - spark with mesos

2015-03-23 Thread Anirudha Jadhav
i have a mesos cluster, which i deploy spark to by using instructions on
http://spark.apache.org/docs/0.7.2/running-on-mesos.html

after that the spark shell starts up fine.
then i try the following on the shell:

val data = 1 to 1

val distData = sc.parallelize(data)

distData.filter(_ 10).collect()

open spark web ui at host:4040 and see an active job.

NOW, how do i start workers or spark workers on mesos ? who completes my
job?
thanks,

-- 
Ani


Re: newbie quesiton - spark with mesos

2015-03-23 Thread Anirudha Jadhav
:02 PM, Dean Wampler deanwamp...@gmail.com wrote:

That's a very old page, try this instead:

http://spark.apache.org/docs/latest/running-on-mesos.html

When you run your Spark job on Mesos, tasks will be started on the slave
nodes as needed, since fine-grained mode is the default.

For a job like your example, very few tasks will be needed. Actually only
one would be enough, but the default number of partitions will be used. I
believe 8 is the default for Mesos. For local mode (local[*]), it's the
number of cores. You can also set the propoerty spark.default.parallelism.

HTH,

Dean

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com

On Mon, Mar 23, 2015 at 11:46 AM, Anirudha Jadhav aniru...@nyu.edu wrote:

 i have a mesos cluster, which i deploy spark to by using instructions on
 http://spark.apache.org/docs/0.7.2/running-on-mesos.html

 after that the spark shell starts up fine.
 then i try the following on the shell:

 val data = 1 to 1

 val distData = sc.parallelize(data)

 distData.filter(_ 10).collect()

 open spark web ui at host:4040 and see an active job.

 NOW, how do i start workers or spark workers on mesos ? who completes my
 job?
 thanks,

 --
 Ani