Hi,
I try to test spark streaming 2.2.0 version with confluent 3.3.0
I have got lot of error during compilation this is my sbt:
lazy val sparkstreaming = (project in file("."))
.settings(
name := "sparkstreaming",
organization := "org.arek",
version := "0.1-SNAPSHOT",
scalaVersion
You don't need multiple spark sessions to have more than one stream
working, but from maintenance and reliability perspective it is not good
idea.
On Thu, Sep 7, 2017 at 2:40 AM, kant kodali wrote:
> Hi All,
>
> I am wondering if it is ok to have multiple sparksession's in
It is working fine, but it is not supported by Cloudera.
On May 17, 2017 1:30 PM, "issues solution"
wrote:
> Hi ,
> it s possible to use prebuilt version of spark 2.1 inside cloudera 5.8
> where scala 2.1.0 not scala 2.1.1 and java 1.7 not java 1.8
>
> Why ?
>
>
:
spark.read.option("sep",";").option("header", "true").option("inferSchema",
"true").format("csv").load("nonuslocalized.csv")
If not should I create ticket in jira for this ? I can work on solution if
not available.
Best Regards,
Arkadiusz Bicz
Sorry, I've found one error:
If you do NOT need any relational processing of your messages ( basing on
historical data, or joining with other messages) and message
processing is quite independent Kafka plus Spark Streaming could be
overkill.
On Tue, Apr 19, 2016 at 1:54 PM, Arkadiusz Bicz
on it do access to your cache and disc. For cache for me most
promising looks Alluxio.
BR,
Arkadiusz Bicz
On Tue, Apr 19, 2016 at 6:01 AM, Deepak Sharma <deepakmc...@gmail.com> wrote:
> Hi all,
> I am looking for an architecture to ingest 10 mils of messages in the micro
> bat
Hello,
Is there any statistics regarding YARN vs Standalone Spark Usage in
production ?
I would like to choose most supported and used technology in
production for our project.
BR,
Arkadiusz Bicz
-
To unsubscribe, e-mail
Hi,
Hdfs is append only, that you need to modify it as you read and write
in other place.
On Wed, Feb 17, 2016 at 2:45 AM, SRK wrote:
> Hi,
>
> How do I update data saved as Parquet in hdfs using dataframes? If I use
> SaveMode.Append, it just seems to append the data
I had similar as #2 problem when I used lot of caching and then doing
shuffling It looks like when I cached too much there was no enough
space for other spark tasks and it just hang on.
That you can try to cache less and see if improve, also executor logs
help a lot (watch out logs with
Hi,
You need good monitoring tools to send you alarms about disk, network
or applications errors, but I think it is general dev ops work not
very specific to spark or hadoop.
BR,
Arkadiusz Bicz
https://www.linkedin.com/in/arkadiuszbicz
On Thu, Feb 11, 2016 at 7:09 PM, Andy Davidson
way lot of data.
Also when using DataFrames there is huge overhead by caching files
information as described in
https://issues.apache.org/jira/browse/SPARK-11441
BR,
Arkadiusz Bicz
https://www.linkedin.com/in/arkadiuszbicz
On Thu, Feb 11, 2016 at 7:24 PM, Jakob Odersky <ja...@odersky.com> wr
ithub.com/pabloa/grafana-alerts. I
have not used grafana-alerts but looks promising.
BR,
Arkadiusz Bicz
On Fri, Feb 12, 2016 at 4:38 PM, Andy Davidson
<a...@santacruzintegration.com> wrote:
> Hi Arkadiusz
>
> Do you have any suggestions?
>
> As an engineer I think when
= new
AggregateResults().toColumn
import sqlContext.implicits._
val dsResults = Seq(ResultSmallA("1", "1", Array[Double](1.0,2.0)),
ResultSmallA("1", "1", Array[Double](1.0,2.0)) ).toDS()
dsResults.groupBy(_.tradeId).agg(sumRes)
Best Regards,
Arkadiusz Bicz
https
-with-graphite-and-grafana/
BR,
Arkadiusz Bicz
On Thu, Jan 21, 2016 at 5:33 AM, charles li <charles.up...@gmail.com> wrote:
> I've put a thread before: pre-install 3-party Python package on spark
> cluster
>
> currently I use Fabric to manage my cluster , but it's not enou
Why do you need to be only one file? Spark doing good job writing in
many files.
On Fri, Jan 15, 2016 at 7:48 AM, Patrick McGloin
wrote:
> Hi,
>
> I would like to reparation / coalesce my data so that it is saved into one
> Parquet file per partition. I would also like
Hi
What is the proper configuration for saving parquet partition with
large number of repeated keys?
On bellow code I load 500 milion rows of data and partition it on
column with not so many different values.
Using spark-shell with 30g per executor and driver and 3 executor cores
Hi,
Including query plan :
DataFrame :
== Physical Plan ==
SortBasedAggregate(key=[agreement#23],
functions=[(MaxVectorAggFunction(values#3),mode=Final,isDistinct=false)],
output=[agreement#23,maxvalues#27])
+- ConvertToSafe
+- Sort [agreement#23 ASC], false, 0
+- TungstenExchange
Hi,
I have done some performance tests by repeating execution with
different number of executors and memory for YARN clustered Spark
(version 1.6.0) ( cluster contains 6 large size nodes)
I found Dataset joinWith or cogroup from 3 to 5 times slower then
broadcast join in DataFrame, how to
Hi,
You can checkout http://spark.apache.org/docs/latest/monitoring.html,
you can monitor hdfs, memory usage per job and executor and driver. I
have connected it to Graphite for storage and Grafana for
visualization. I have also connected to collectd which provides me all
server nodes metrics
Hi,
There are some documentation in
https://docs.cloud.databricks.com/docs/spark/1.6/index.html#examples/Dataset%20Aggregator.html
and also you can check out tests of DatasetSuite in spark sources.
BR,
Arkadiusz Bicz
On Mon, Jan 11, 2016 at 5:37 AM, Muthu Jayakumar <bablo...@gmail.
uot;create external table testsubdirectories (id string, value
string) STORED AS PARQUET location '/tmp/df'")
val hcall = hc.sql("select * from testsubdirectories")
assert(hcall.count() == 6) //shoud return 6 but it is 0 as not read
from subdirectories
Thanks,
Arkadiusz Bicz
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
.parquet("/tmp/smallresults2")
The same when I create external table in hive context as results table
hiveContext.sql("select * from results limit
1").write.parquet("/tmp/results/one3")
Thanks,
Arkadiusz Bicz
-
Hello,
Spark collect HDFS read/write metrics per application/job see details
http://spark.apache.org/docs/latest/monitoring.html.
I have connected spark metrics to Graphite and then doing nice graphs
display on Graphana.
BR,
Arek
On Thu, Dec 31, 2015 at 2:00 PM, Steve Loughran
23 matches
Mail list logo