at this also?
Cheers
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own
on two different HDFS clusters?
thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer
and compactness. I can
write a Spark streaming code in Sala pretty fast or import massive RDBMS
table into Hive and table of my design equally very fast using Scala.
I don't know may be I cannot be bothered writing 100 lines of Java for a
simple query from a table :)
Dr Mich Talebzadeh
LinkedIn *
https
.
Hence I was wondering how much truth is there in this statement. Given that
Spark uses Scala as its core development language, what is the general view
on the use of Scala, Python or Java?
Thanks,
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
process will be running on edge node.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disc
at our
future needs. From my experience of these tools, you cannot simply roll it
back without incurring considerable work and considerable cost.
And after all will the cost justify the whole of this setup? How about
performance and other bottlenecks?
Thanks
Dr Mich Talebzadeh
LinkedIn
Hi John,
Thanks. Did you end up in production or in other words besides PoC did you
use it in anger?
The intention is to build Isilon on top of the whole HDFS cluster!. If we
go that way we also need to adopt it for DR as well.
Cheers
Dr Mich Talebzadeh
LinkedIn *
https
proof of such tools. So I was wondering if
anyone else has tried such solution.
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
As a matter of interest what is the best way of creating virtualised
clusters all pointing to the same physical data?
thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
can build virtual clusters on the same data. One
cluster for read/writes and another for Reads? That is what has been
suggestes!.
regards
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
anyone is using this product in anger.
At the end of the day it's not HDFS. It is OneFS with a HCFS API. However
that may suit our needs. But would need to PoC it and test it thoroughly!
Cheers
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
data
to be in one place regardless of artefacts used against it such as Spark?
Thanks,
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
tring,Comparable[_
>: java.math.BigDecimal with String <: Comparable[_ >: java.math.BigDecimal
with String <: java.io.Serializable] with java.io.Serializable] with
java.io.Serializable])
val s = HiveContext.read.format("jdbc").options(
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com
Hi,
This JDBC connection works with Oracle table with primary key ID
val s = HiveContext.read.format("jdbc").options(
Map("url" -> _ORACLEserver,
"dbtable" -> "(SELECT ID, CLUSTERED, SCATTERED, RANDOMISED, RANDOM_STRING,
SMALL_VC, PADDING FROM scratchpad.dummy)",
"partitionColumn" -> "ID",
% "2.6.2"
libraryDependencies += "org.apache.phoenix" % "phoenix-spark" %
"4.6.0-HBase-1.0"
libraryDependencies += "org.apache.hbase" % "hbase" % "1.2.3"
libraryDependencies += "org.apache.hbase" % "hbase-client" % "1.
Thanks all. How about Kafka HA which is important. Is it best to use
application specific Kafka delivery or Kafka MirrorMaker?
Cheers
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
for Kafka for use with Spark
Streaming?
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disc
not using UNUX tools such as Nagios etc, are they tools that can
be deployed for spark cluster itself? I guess top/htop can be used but
those are available anyway.
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<ht
agreed.
The best option is to ingest to ingesting tables in Oracle. Many people
ingest into main Oracle table which is wrong design in my opinion.
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/prof
211)
at
org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)
... 74 elided
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
Ingesting from Hive tables back into Oracle. What mechanisms are in place
to ensure that data ends up consistently into Oracle table and Spark is
notified when Oracle has issues with data ingested (say rollback)?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
Thanks Jorn,
So Tableau uses its own in-memory representation as I guessed. Now the
question is how is performance accessing data in Oracle tables>
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/p
htm>containing
star schema and create and ingest the same tables and data into Hive
tables. Then run Tableau against these tables and do the performance
comparison. Given that Oracle is widely used with Tableau this test makes
sense?
Thanks.
Dr Mich Talebzadeh
LinkedIn *
https://www.linked
in general in a single JVM which is basically running in Local mode, you
have only one Spark Context. However, you can stop the current Spark
Context by
sc.stop()
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<ht
files have to reside in HDFS
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it
you can use Spark directly on csv file.
1. Put the csv files into HDFS /apps//data/staging/
2. Multiple csv files for the same table can co-exist
3. like df1 = spark.read.option("header", false).csv(location)
4.
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.c
end to do all this via shell script that gives control at each layer and
creates alarms.
HTH
1.
2.
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP
r file or something?
thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at y
Thanks Kuan for insight. Much appreciated.
Mich
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpre
n your Spark jobs including HA failover using
Platform Symphony ha.
Has anyone had any experience of using Yarn with IBM Platform Symphony at
all including Proof of Concept?
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
Hi,
Has anyone had any experience of using IBM Fluid query and comparing it
with Spark with its MPP and in-memory capabilities?
Thanks,
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
Sounds like Cloudera do not supply the shell for spark-sql but only
spark-shell
is that correct?
I appreciate that one can use spark-shell. however, sounds like spark-sql
is excluded in favour of Impala?
cheers
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
ight201601;
show tables;
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it
lue()).toString,
Bytes.toString( iter.next().getValue()).toString,
Bytes.toString(iter.next().getValue())
)}
The above reads the column family columns sequentially. How can I force it
to read specific columns only?
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.
tasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:53)
at
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:315)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.load
ationProvider.scala:53)
at
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:315)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122)
... 56 elided
Dr Mich Ta
Ok just to be clear do you mean
ADD_JARS="~/jars/ojdbc6.jar" spark-shell
or
spark-shell --jars $ADD_JARS
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/pr
at the shell
file
"${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main --name
"Spark shell" "$@"
hm
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https:/
When one runs in Local mode (one JVM) on an edge host (the host user
accesses the cluster), it is possible to put additional jar file say
accessing Oracle RDBMS tables in $SPARK_CLASSPATH. This works
export SPARK_CLASSPATH=~/user_jars/ojdbc6.jar
Normally a group of users can have read access to
ojdbc6.jar. FYI the Oracle
database version accessed is 11g, R2
Also it is a challenge in a multi-talented cluster to maintain multiple
versions of jars for the same database type through $SPARK_HOME/conf/
spark-defaults.conf!
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com
thanks Ayan, do you mean
"driver" -> "oracle.jdbc.OracleDriver"
we added that one but did not work!
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.li
dbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:53)
at
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:315)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122)
Any ideas?
oughts?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own risk.
try this it should work and yes they are comma separated
spark-streaming-kafka_2.10-1.5.1.jar
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV
How many tables are involved in the SQL join and how do you cache them?
If you do unpersist on the DF(s) and run the same SQL query (the same
sesiion) what do you see with explain?
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
How many tables are involved in the SQL join and how do you cache them?
If you do unpersist on the DF and run the sdame
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
/views in Spark can be
used or Spark functional programming with Scala. Also the performance of
JDBC matters.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
you JDBC connection to RDBMS table
and you will need to have a primary key on the table.
I am going to test it to see how performant it is to offer Spark as a fast
query engine for RDNMS.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
are assigned to users. Is that done
by YARN?
2. What will happen if more than one Livy is running on the same cluster
all controlled by the same YARN. how resouces are allocated
cheers
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
Thanks Richard for the link. Also its interaction with Zeppelin is great.
I believe it is a very early stage for now
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
Hi,
Has there been any experience using Livy with Spark to share multiple Spark
contexts?
thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV
The only way I think of would be accessing Hive tables through their
respective thrift servers running on different clusters but not sure you
can do it within Spark. Basically two different JDBC connections.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
on Hive. it will always get the same values as stored by Hive
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpre
in this POC of yours are you running this app with spark in Local mode by
any chance?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own risk. Any a
Thanks Ian.
Was your source of Flume IBM/MQ by any chance?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
Hi Ian,
Has this been resolved?
How about data to Flume and then Kafka and Kafka streaming into Spark?
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
was wondering if this is a tried and tested as opposed experimental one?
For example this Spark doc
<http://spark.apache.org/docs/latest/streaming-flume-integration.html>talks
about Flume integration.
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/v
hi,
can someone share their experience of feeding data from ibm/mq messages
into flume, then from flume to kafka and using spark streaming on it?
any issues and things to be aware of?
thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
hi,
I guess the only way to do this is to read ibm mq messages into flume,
ingest it into hdfs and read it from there. alternatively use flume to
ingest data into hbase and then use spark on hbase.
I don't think there is an api like spark streaming with kafka for ibm mq?
thanks
Dr Mich
Thanks Ayan.
That only works for extra characters like ^ characters etc. Unfortunately
it does not cure specific character sets.
cheers
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
this as well?
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your ow
through some tests cycles.
We have some ideas but appreciate some other feedbacks.
The current version is CHDS 5.2.
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
*up-to-date* the data in replicate site is going to be.
Bottom line how good is to deploy such tool given the cost?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
data from London to Singapore. It can
become a nightmare.
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpre
e cluster or more likely between data centers).
>
> As you mentioned, Hbase & Co may require a special consideration for the
> case that data is in-memory and not yet persisted.
>
> On Sat, Nov 12, 2016 at 12:04 PM, Mich Talebzadeh <
> mich.talebza...@gmail.c
thanks Vince
can you provide more details on this pls
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpre
I really don't see why one wants to set up streaming replication unless for
situations where similar functionality to transactional databases is
required in big data?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<ht
of it. streaming replication as opposed to snapshot.
sounds familiar. think of it as log shipping in oracle old days versus
goldengate etc.
hth
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
reason being ?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own ris
starts at $4,000 per node per year all inclusive.
With discount it can be halved but we are talking a node itself so if you
have 5 nodes in primary and 5 nodes in DR we are talking about $40K already.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
e site.
The idea is that is faster than doing it through traditional HDFS copy
tools which are normally batch oriented.
It also claims to replicate Hive metadata as well.
I wanted to gauge if anyone has used it or a competitor product. The claim
is that they do not have competitors!
Thanks
D
into target tables in Hive
periodically. I will still go for ORC tables. Data. will be append only.
That is my conclusion.but still open to suggestions.
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<ht
schema
I believe the above is feasible?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer
Mich
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own risk. Any a
generic alternative?
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at yo
> df.toDF.createOrReplaceTempView("tmp")
scala> spark.sql("drop view if exists tmp")
Check UI (port 4040) storage page to see what is cached etc.
Just try either options to see which one is more optimum. Option 2 may be
more optimum.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.li
r, and any additional classpath specified
# *through spark.driver.extraClassPath is not automatically propagated.*
Whether this is relevant or not I am not sure
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV
ntent is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.
On 1 November 2016 at 14:02, Jan Botorek <jan.boto...@infor.com> wrote:
> Yes, exactly.
> My (testing) run script is:
>
> spark-
Are you submitting your job through spark-submit?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpre
that directory.
The other alternative is to mount the shared directory as NFS mount across
all the nodes and all the noses can read from that shared directory
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<ht
moved when the session ends or table is dropped
Not sure how Spark handles this.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV
uster in which it was created. The data is stored using Hive's
highly-optimized, in-memory columnar format."
So on the face of it tempTable is an in-memory table
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV
ark.sql.DataFrame = []
Also your point
"But the thing is that I don't explicitly cache the tempTables ..".
I believe tempTable is created in-memory and is already cached
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?
this in UI storage page.
Alternative is to use persist(StorageLevel.MEMORY_AND_DISK_SER()) with a
mix of cached and disk.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
know.
Have you tried it using predicate push-down on the underlying table itself?
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
I can hear and see plenty of firework in this foggy London tonight :)
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
Enjoy the festive season.
Regards,
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer
Hi,
I think tempTable is private to the session that creates it. In Hive temp
tables created by "CREATE TEMPORARY TABLE" are all private to the session.
Spark is no different.
The alternative may be everyone creates tempTable from the same DF?
HTH
Dr Mich Talebzadeh
LinkedI
with in-memory storage where app 2
can pick up app1 results from memory or even SSD and do the work.
Actually I am surprised why Spark has not incorporated this type of memory
as temporary storage.
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
so I assume Ignite will not work with Spark version >=2?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.word
. For example the same tempTable etc?
Cheers
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disc
Thanks Chanh,
Can it share RDDs.
Personally I have not used either Alluxio or Ignite.
1. Are there major differences between these two
2. Have you tried Alluxio for sharing Spark RDDs and if so do you have
any experience you can kindly share
Regards
Dr Mich Talebzadeh
LinkedIn
with something like Apache Ignite.
Has anyone really tried this. Will that work with multiple applications?
It looks feasible as RDDs are immutable and so are registered tempTables
etc.
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
uot;
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own r
of translating object state into a format that can be stored and retrieved
from memory buffer?
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV
uted among executors?
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it
=
AverageDailyPrice: double
328.0
327.13
325.63
I can do it in shell but there must be a way of running the commands
silently?
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
Hi,
The correct way of doing it for a String argument is using eche ' ' passing
the string directly as below
spark-shell -i <(echo 'val ticker = "tsco"' ; cat stock.scala)
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOAB
;(echo val ticker = $TICKER ; cat )
as describe here
<http://stackoverflow.com/questions/29928999/passing-command-line-arguments-to-spark-shell>
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
&l
1001 - 1100 of 2083 matches
Mail list logo