I think it should be. These configurations doesn't depend on specific
cluster manager use chooses.
On Tue, Feb 28, 2017 at 4:42 AM, satishl wrote:
> Are spark.speculation and related settings supported on standalone mode?
>
>
>
> --
> View this message in context:
After start the dfs, yarn and spark, I run these code under the root
directory of spark on my master host:
`MASTER=yarn ./bin/run-example ml.LogisticRegressionExample
data/mllib/sample_libsvm_data.txt`
Actually I get these code from spark's README. And here is the source code
about
Hi,
I have been trying to get my Spark job upgraded to 2.x. I see the following
error. It seems to be looking for some global_temp database by default. Is
this a behaviour of Spark 2.x that it looks for global_temp database by
default?
17/02/27 16:59:09 INFO HiveMetaStore.audit: ugi=user1234
Is anybody using Spark streaming/SQL to load a relational data warehouse in
real time? There isn't a lot of information on this use case out there. When I
google real time data warehouse load, nothing I find is up to date. It's all
turn of the century stuff and doesn't take into account
Hello,
Looks like the API docs linked from the Spark Kafka 0.10 Integration page are
not current.
For instance, on the page
https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html
the code examples show the new API (i.e. class ConsumerStrategies). However,
Are spark.speculation and related settings supported on standalone mode?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-speculation-setting-support-on-standalone-mode-tp28433.html
Sent from the Apache Spark User List mailing list archive at
Hi
Thanks a lot, i used property file to resolve the issue. I think
documentation should mention it though.
On Tue, 28 Feb 2017 at 5:05 am, Marcelo Vanzin wrote:
> > none of my Config settings
>
> Is it none of the configs or just the queue? You can't set the YARN
> queue
Even the hive configurations like the following would work with this?
sqlContext.setConf("hive.default.fileformat", "Orc")
sqlContext.setConf("hive.exec.orc.memory.pool", "1.0")
sqlContext.setConf("hive.optimize.sort.dynamic.partition", "true")
Thanks! That works:
def process_file(my_iter):
the_id = "init"
final = []
for chunk in my_iter:
lines = chunk[1].split("\n")
for line in lines:
if line[0:15] == 'WARC-Record-ID:':
the_id = line[15:]
final.append(Row(the_id =
I am hoping someone can confirm this is a bug and/or provide a solution. I
am trying to serialize an LDA model to disk for later use, but upon
deserialization the model is not fully functional. In particular,
transformation of data throws a NullPointerException. Here is a minimal
example (just run
> none of my Config settings
Is it none of the configs or just the queue? You can't set the YARN
queue in cluster mode through code, it has to be set in the command
line. It's a chicken & egg problem (in cluster mode, the YARN app is
created before your code runs).
--property-file works the
All you need to do is -
spark.conf.set("spark.sql.shuffle.partitions", 2000)
spark.conf.set("spark.sql.orc.filterPushdown", True)
...etc
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-hive-configs-in-Spark-2-1-tp28429p28431.html
Sent from the
Hi, you can refer to https://issues.apache.org/jira/browse/SPARK-14083 for
more detail.
For performance issue,it is better to using the DataFrame than DataSet API.
On Sat, Feb 25, 2017 at 2:45 AM, Jacek Laskowski wrote:
> Hi Justin,
>
> I have never seen such a list. I think
Hi,
How to set the hive configurations in Spark 2.1? I have the following in
1.6. How to set the configs related to hive using the new SparkSession?
sqlContext.sql(s"use ${HIVE_DB_NAME} ")
sqlContext.setConf("hive.exec.dynamic.partition", "true")
I think it is users responsibility to validate the input before feeding.
https://databricks.gitbooks.io/databricks-spark-knowledge-base/best_practices/dealing_with_bad_data.html
--
View this message in context:
Ok, thanks a lot for the heads up.
Sent from my iPhone
> On Feb 25, 2017, at 10:58 AM, Steve Loughran wrote:
>
>
>> On 24 Feb 2017, at 07:47, Femi Anthony wrote:
>>
>> Have you tried reading using s3n which is a slightly older protocol ? I'm
>>
Hi,
I have a project which uses Jackson 2.8.5. Spark on the other hand seems to be
using 2.6.5
I am using maven to compile.
My original solution to the problem have been to set spark dependencies with
the "provided" scope and use maven shade plugin to shade Jackson in my
compilation.
The
Hi,
master = "spark://193.70.43.207:7077"
appName = "romain2"
spark = SparkSession.builder.master(master).appName(appName).getOrCreate()
also gives me an error :
IllegalArgumentException: u"Error while instantiating
'org.apache.spark.sql.hive.HiveSessionState':"
Any way out ?
--
View
Hi, Fletcher.
case class can help construct complex structure.
and also, RDD, StructType and StructureField are helpful if you need.
However,
the code is a little confusing,
source.map{ row => {
val key = row(0)
val buff = new ArrayBuffer[Row]()
buff += row
(key,buff)
This won't work:
rdd2 = rdd.flatMap(splitf)
rdd2.take(1)
[u'WARC/1.0\r']
rdd2.count()
508310
If I then try to apply a map to rdd2, the map only works on each
individual line. I need to create a state machine as in my second
function. That is, I need to apply a key to each line, but the
20 matches
Mail list logo