nside the < raw >..< /raw > so Text-only
> mail clients prune what’s inside.
> Anyway here’s the text again. (Inline)
>
> > On 02-May-2016, at 23:56, Ted Yu <yuzhih...@gmail.com> wrote:
> >
> > Maybe you were trying to embed pictures for the error and you
Maybe you were trying to embed pictures for the error and your code - but
they didn't go through.
On Mon, May 2, 2016 at 10:32 AM, meson10 wrote:
> Hi,
>
> I am trying to save a RDD to Cassandra but I am running into the following
> error:
>
>
>
> The Python code looks
Please consider decreasing block size.
Thanks
> On May 1, 2016, at 9:19 PM, Buntu Dev wrote:
>
> I got a 10g limitation on the executors and operating on parquet dataset with
> block size 70M with 200 blocks. I keep hitting the memory limits when doing a
> 'select * from
er artifact
> org.apache:apache:pom:14 from/to central (https://repo1.maven.org/maven2):
> Connect to repo1.maven.org:443 [repo1.maven.org/23.235.47.209] failed:
> Connection timed out and 'parent.relativePath' points at wrong local POM @
> line 22, column 11 -> [Help 2]
> [ERROR]
> [ER
s because fail to download this url:
> http://maven.twttr.com/org/apache/apache/14/apache-14.pom
>
>
> -- 原始邮件 ------
> *发件人:* "Ted Yu";<yuzhih...@gmail.com>;
> *发送时间:* 2016年5月1日(星期天) 晚上9:27
> *收件人:* "sunday2000"<2314476...
According to
examples/src/main/scala/org/apache/spark/examples/streaming/KafkaWordCount.scala
:
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerConfig,
ProducerRecord}
Can you give the command line you used to submit the job ?
Probably classpath issue.
On Sun, May 1, 2016 at
bq. Non-resolvable parent POM for org.apache.spark:spark-parent_2.10:1.6.1
Looks like you were using Spark 1.6.1
Can you check firewall settings ?
I saw similar report from Chinese users.
Consider using proxy.
On Sun, May 1, 2016 at 4:19 AM, sunday2000 <2314476...@qq.com> wrote:
> Hi,
> We
Can you provide a bit more information:
Does the smaller dataset have skew ?
Which release of Spark are you using ?
How much memory did you specify ?
Thanks
On Sat, Apr 30, 2016 at 1:17 PM, Brandon White
wrote:
> Hello,
>
> I am writing to datasets. One dataset is
For #1, have you seen this JIRA ?
[SPARK-14867][BUILD] Remove `--force` option in `build/mvn`
On Thu, Apr 28, 2016 at 8:27 PM, Demon King wrote:
> BUG 1:
> I have installed maven 3.0.2 in system, When I using make-distribution.sh
> , it seem not use maven 3.2.2 but use
What happened when you tried to access port 8080 ?
Checking iptables settings is good to do.
At my employer, we use OpenStack clusters daily and don't encounter much
problem - including UI access.
Probably some settings should be tuned.
On Thu, Apr 28, 2016 at 5:03 AM, Dan Dong
common/*:/usr/local/project/hadoop/share/hadoop/hdfs:/usr/local/project/hadoop/share/hadoop/hdfs/lib/*:/usr/local/project/hadoop/share/hadoop/hdfs/*:/usr/local/project/hadoop/share/hadoop/yarn/lib/*:/usr/local/project/hadoop/share/hadoop/yarn/*:/usr/local/project/hadoop/share/hadoop/mapreduce/lib/*:/
e
> on how to save data. There is only one for reading/querying data. Will this
> be added when the final version does get released?
>
> Thanks,
> Ben
>
>> On Apr 21, 2016, at 6:56 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>> The hbase-spark
Did you have a chance to take jstack when VersionsSuite was running ?
You can use the following command to run the test:
sbt/sbt "test-only org.apache.spark.sql.hive.client.VersionsSuite"
On Wed, Apr 27, 2016 at 9:01 PM, Demon King wrote:
> Hi, all:
>I compile
>>>
>>> And I got; org.apache.spark.sql.AnalysisException: Reference 'b' is
>>> ambiguous, could be: b#6, b#14.;
>>> If same case, this message makes sense and this is clear.
>>>
>>> Thought?
>>>
>>> // maropu
>>>
>
Did you do the import as the first comment shows ?
> On Apr 27, 2016, at 2:42 AM, shengshanzhang wrote:
>
> Hi,
>
> On spark website, there is code as follows showing how to create
> datasets.
>
>
> However when i input this line into
.spark.sql.AnalysisException: Reference 'b' is
> ambiguous, could be: b#6, b#14.;
> If same case, this message makes sense and this is clear.
>
> Thought?
>
> // maropu
>
>
>
>
>
>
>
> On Wed, Apr 27, 2016 at 6:09 AM, Prasad Ravilla <pras...@slalom.com
Looking at the cause of the error, it seems hadoop-aws-xx.jar
(corresponding to the version of hadoop you use) was missing in classpath.
FYI
On Tue, Apr 26, 2016 at 9:06 AM, Jinan Alhajjaj
wrote:
> Hi All,
> I am trying to read a file stored in Amazon S3.
> I wrote
Please take a look at:
core/src/main/scala/org/apache/spark/SparkContext.scala
* Do `val rdd = sparkContext.wholeTextFile("hdfs://a-hdfs-path")`,
*
* then `rdd` contains
* {{{
* (a-hdfs-path/part-0, its content)
* (a-hdfs-path/part-1, its content)
* ...
*
Can you show us the structure of df2 and df3 ?
Thanks
On Mon, Apr 25, 2016 at 8:23 PM, Divya Gehlot
wrote:
> Hi,
> I am using Spark 1.5.2 .
> I have a use case where I need to join the same dataframe twice on two
> different columns.
> I am getting error missing
Can you show snippet of your code which demonstrates what you observed ?
Thansk
On Mon, Apr 25, 2016 at 8:38 AM, Weiping Qu wrote:
> Thanks.
> I read that from the specification.
> I thought the way people distinguish actions and transformations depends
> on whether
Can you show more of your code inside the while loop ?
Which version of Spark / Kinesis do you use ?
Thanks
On Mon, Apr 25, 2016 at 4:04 AM, Selvam Raman wrote:
> I am reading a data from Kinesis stream (merging shard values with union
> stream) to spark streaming. then
Have you taken a look at:
sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
On Sun, Apr 24, 2016 at 8:18 AM, coder wrote:
> JavaRDD prdd = sc.textFile("c:\\fls\\people.txt").map(
> new Function() {
> public
Which version of Spark are you using ?
How did you increase the open file limit ?
Which operating system do you use ?
Please see Example 6. ulimit Settings on Ubuntu under:
http://hbase.apache.org/book.html#basic.prerequisites
On Sun, Apr 24, 2016 at 2:34 AM, fanooos
Can you check that the DFSClient Spark uses is the same version as on the
server side ?
The client and server (NameNode) negotiate a "crypto protocol version" -
this is a forward-looking feature.
Please note:
bq. Client provided: []
Meaning client didn't provide any supported crypto protocol
Which hbase release are you using ?
Below is the write method from hbase 1.1 :
public void write(KEY key, Mutation value)
throws IOException {
if (!(value instanceof Put) && !(value instanceof Delete)) {
throw new IOException("Pass a Delete or a Put");
}
The class is private :
final class OffsetRange private(
On Fri, Apr 22, 2016 at 4:08 PM, Mich Talebzadeh
wrote:
> Ok I decided to forgo that approach and use an existing program of mine
> with slight modification. The code is this
>
> import
This was added by Xiao through:
[SPARK-13320][SQL] Support Star in CreateStruct/CreateArray and Error
Handling when DataFrame/DataSet Functions using Star
I tried in spark-shell and got:
scala> val first =
structDf.groupBy($"a").agg(min(struct($"record.*"))).first()
first:
Marcelo:
>From yesterday's thread, Mich revealed that he was looking at:
https://github.com/agsachin/spark/blob/CEP/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
which references SparkFunSuite.
In an earlier thread, Mich was asking about CEP.
Just
scala:654:
> not found: type WindowState
> [error] def deep(in: Tick, ew: WindowState): Boolean = {
> [error]^
> [error]
> /data6/hduser/scala/CEP_assembly/src/main/scala/myPackage/CEP_assemly.scala:660:
> not found: type WindowState
> [error]
There is not much in the body of email.
Can you elaborate what issue you encountered ?
Thanks
On Fri, Apr 22, 2016 at 2:27 AM, Rowson, Andrew G. (TR Technology & Ops) <
andrew.row...@thomsonreuters.com> wrote:
>
>
>
> This e-mail is for the sole use of the
Normally Logging would be included in spark-shell session since spark-core
jar is imported by default:
scala> import org.apache.spark.internal.Logging
import org.apache.spark.internal.Logging
See this JIRA:
[SPARK-13928] Move org.apache.spark.Logging into
org.apache.spark.internal.Logging
In
dOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 21 April 2016 at 20:24, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Plug in 1.5.1 for your jars:
>>
>> $ jar tvf ./core/target/s
t; 3982 Wed Sep 23 23:34:26 BST 2015 org/apache/spark/SparkFunSuite.class
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn *
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJ
Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
In KafkaWordCount , the String is sent back and producer.send() is called.
I guess if you don't find via solution in your current design, you can
consider the above.
On Thu, Apr 21, 2016 at 10:04 AM, Alexander Gallego
wrote:
> Hello,
>
> I understand that you cannot
e.spark
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmi
ome/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar,/home/hduser/jars/scalatest_2.11-2.2.6.jar'
>>
>>
>> scala> import org.scalatest.{BeforeAndAfter, BeforeAndAfterAll}
>> import org.scalatest.{BeforeAndAfter, BeforeAndAfterAll}
>>
>>
>
ced an extra leading comma after '--jars' in your email.
Not sure if that matters.
On Thu, Apr 21, 2016 at 8:39 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> Mich:
>
> $ jar tvf
> /home/hbase/.m2/repository/org/scalatest/scalatest_2.11/2.2.6/scalatest_2.11-2.2.6.jar
> | grep BeforeA
Mich:
$ jar tvf
/home/hbase/.m2/repository/org/scalatest/scalatest_2.11/2.2.6/scalatest_2.11-2.2.6.jar
| grep BeforeAndAfter
4257 Sat Dec 26 14:35:48 PST 2015 org/scalatest/BeforeAndAfter$class.class
2602 Sat Dec 26 14:35:48 PST 2015 org/scalatest/BeforeAndAfter.class
1998 Sat Dec 26
The hbase-spark module in Apache HBase (coming with hbase 2.0 release) can
do this.
On Thu, Apr 21, 2016 at 6:52 AM, Benjamin Kim wrote:
> Has anyone found an easy way to save a DataFrame into HBase?
>
> Thanks,
> Ben
>
>
>
In upcoming 2.0 release, the signature for map() has become:
def map[U : Encoder](func: T => U): Dataset[U] = withTypedPlan {
Note: DataFrame and DataSet are unified in 2.0
FYI
On Thu, Apr 21, 2016 at 6:49 AM, Apurva Nandan wrote:
> Hello everyone,
>
> Generally
>
> Can't translate null value for field
> StructField(density,DecimalType(4,2),true)
> On Apr 21, 2016 1:37 AM, "Ted Yu" <yuzhih...@gmail.com> wrote:
>
>> The weight field is not nullable.
>>
>> Looks like your source table had null value for this fi
The weight field is not nullable.
Looks like your source table had null value for this field.
On Wed, Apr 20, 2016 at 4:11 PM, Charles Nnamdi Akalugwu <
cprenzb...@gmail.com> wrote:
> Hi,
>
> I am using spark 1.4.1 and trying to copy all rows from a table in one
> MySQL Database to a Amazon RDS
Please take a look at:
https://spark.apache.org/docs/latest/sparkr.html#sparkr-dataframes
On Wed, Apr 20, 2016 at 9:50 AM, Ashok Kumar
wrote:
> Hi,
>
> I have Spark 1.6.1 but I do bot know how to invoke SparkR so I can use R
> with Spark.
>
> Is there a s hell
Do you mind trying out build from master branch ?
1.5.3 is a bit old.
On Wed, Apr 20, 2016 at 5:25 AM, FangFang Chen
wrote:
> I found spark sql lost precision, and handle data as int with some rule.
> Following is data got via hive shell and spark sql, with same sql
Can you tell us the memory parameters you used ?
If you can capture jmap before the GC limit was exceeded, that would give us
more clue.
Thanks
> On Apr 19, 2016, at 7:40 PM, "kramer2...@126.com" wrote:
>
> Hi All
>
> I use spark doing some calculation.
> The situation
Using
http://www.ruddwire.com/handy-code/date-to-millisecond-calculators/#.VxZh3iMrKuo
, 1460823008000 is shown to be 'Sat Apr 16 2016 09:10:08 GMT-0700'
Can you clarify the 4 day difference ?
bq. 'right now April 14th'
The date of your email was Apr 16th.
On Sat, Apr 16, 2016 at 9:39 AM,
The CatalogTracker object may not be used by all the methods of HBaseAdmin.
Meaning, when HBaseAdmin is constructed, we don't need CatalogTracker.
On Tue, Apr 19, 2016 at 6:09 AM, WangYQ wrote:
> in hbase 0.98.10, class "HBaseAdmin "
> line 303, method
ly not working, at least for logging configuration.
>
> Thanks,
> -carlos.
>
> On Fri, Apr 15, 2016 at 3:28 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> See this thread: http://search-hadoop.com/m/q3RTtsFrd61q291j1
>>
>> On Fri, Apr 15, 2016 at 5:38 AM,
of the 1.6.1 artifacts to that S3 bucket, so hopefully everything should be
> working now. Let me know if you still encounter any problems with
> unarchiving.
>
> On Sat, Apr 16, 2016 at 3:10 PM Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Pardon me - there is no tarball for hado
I tried changing the URL to
> https://s3.amazonaws.com/spark-related-packages/spark-1.6.1-bin-hadoop2.7.tgz
> and I get a NoSuchKey error.
>
> Should I just go with it even though it says hadoop2.6?
>
> On Sat, Apr 16, 2016 at 5:37 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
Apr 16, 2016 at 2:14 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> From the output you posted:
> ---
> Unpacking Spark
>
> gzip: stdin: not in gzip format
> tar: Child returned status 1
> tar: Error is not recoverable: exiting now
> ---
>
> The artifac
>From the output you posted:
---
Unpacking Spark
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
---
The artifact for spark-1.6.1-bin-hadoop2.6 is corrupt.
This problem has been reported in other threads.
Try
Kevin:
Can you describe how you got over the Metadata fetch exception ?
> On Apr 16, 2016, at 9:41 AM, Kevin Eid wrote:
>
> One last email to announce that I've fixed all of the issues. Don't hesitate
> to contact me if you encounter the same. I'd be happy to help.
>
>
Looks like this question is more relevant on flink mailing list :-)
On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh
wrote:
> Hi,
>
> Has anyone used Apache Flink instead of Spark by any chance
>
> I am interested in its set of libraries for Complex Event Processing.
Please send query to user@hbase
This is the default value:
zookeeper.znode.parent
/hbase
Looks like hbase-site.xml accessible on your client didn't have up-to-date
value for zookeeper.znode.parent
Please make sure hbase-site.xml with proper config is on the classpath.
On Sat, Apr 16,
See this thread: http://search-hadoop.com/m/q3RTtsFrd61q291j1
On Fri, Apr 15, 2016 at 5:38 AM, Carlos Rojas Matas
wrote:
> Hi guys,
>
> any clue on this? Clearly the
> spark.executor.extraJavaOpts=-Dlog4j.configuration is not working on the
> executors.
>
> Thanks,
>
You can call stop() method.
> On Apr 15, 2016, at 5:21 AM, ram kumar wrote:
>
> Hi,
> I started hivecontext as,
>
> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc);
>
> I want to stop this sql context
>
> Thanks
For Parquet, please take a look at SPARK-1251
For ORC, not sure.
Looking at git history, I found ORC mentioned by SPARK-1368
FYI
On Thu, Apr 14, 2016 at 6:53 PM, Edmon Begoli wrote:
> I am needing this fact for the research paper I am writing right now.
>
> When did Spark
bq. localtest.txt#appSees.txt
Which file did you want to pass ?
Thanks
On Thu, Apr 14, 2016 at 2:14 PM, Benjamin Zaitlen
wrote:
> Hi All,
>
> I'm trying to use the --files option with yarn:
>
> spark-submit --master yarn-cluster /home/ubuntu/test_spark.py --files
>>
Can you pastebin the failure message ?
Did you happen to take jstack during the close ?
Which Hadoop version do you use ?
Thanks
> On Apr 14, 2016, at 5:53 AM, nihed mbarek wrote:
>
> Hi,
> I have an issue with closing my application context, the process take a long
>
w.r.t. the effective storage level log, here is the JIRA which introduced
it:
[SPARK-4671][Streaming]Do not replicate streaming block when WAL is enabled
On Wed, Apr 13, 2016 at 7:43 AM, Patrick McGloin
wrote:
> Hi all,
>
> If I am using a Custom Receiver with
bq. --conf "spark.executor.extraJavaOptions=-Dlog4j.
configuration=env/dev/log4j-driver.properties"
I think the above may have a typo : you refer to log4j-driver.properties in
both arguments.
FYI
On Wed, Apr 13, 2016 at 8:09 AM, Carlos Rojas Matas
wrote:
> Hi guys,
>
>
FYI
https://documentation.cpanel.net/display/CKB/How+To+Clear+Your+DNS+Cache#HowToClearYourDNSCache-MacOS
®10.10
https://www.whatsmydns.net/flush-dns.html#linux
On Tue, Apr 12, 2016 at 2:44 PM, Bibudh Lahiri
wrote:
> Hi,
>
> I am trying to run a piece of code with
You can find various examples involving Serializable Java POJO
e.g.
./examples/src/main/java/org/apache/spark/examples/ml/JavaALSExample.java
Please pastebin some details on 'Task not serializable error'
Thanks
On Tue, Apr 12, 2016 at 12:44 PM, Daniel Valdivia
wrote:
See
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-IDESetup
On Tue, Apr 12, 2016 at 8:52 AM, ImMr.K <875061...@qq.com> wrote:
> But how to import spark repo into idea or eclipse?
>
>
>
> -- 原始邮件 ---------
gen-idea doesn't seem to be a valid command:
[warn] Ignoring load failure: no project loaded.
[error] Not a valid command: gen-idea
[error] gen-idea
On Tue, Apr 12, 2016 at 8:28 AM, ImMr.K <875061...@qq.com> wrote:
> Hi,
> I have cloned spark and ,
> cd spark
> build/sbt gen-idea
>
> got the
See
https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application
On Mon, Apr 11, 2016 at 3:15 PM, Jialin Liu wrote:
> Hi Spark users/experts,
>
> I’m wondering how does the Spark scheduler work?
> What kind of resources will be considered during the
Please take a look
at
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
Cheers
On Mon, Apr 11, 2016 at 12:13 PM, Radhakrishnan Iyer <
radhakrishnan.i...@citiustech.com> wrote:
> Hi all,
>
>
>
> I am new to Spark.
>
> I have a json in below format :
>
>
For SparkR, please refer to https://spark.apache.org/docs/latest/sparkr.html
bq. on Ubuntu or CentOS
Both platforms are supported.
On Mon, Apr 11, 2016 at 1:08 PM, wrote:
> Dear Experts ,
>
> I am posting this for your information. I am a newbie to spark.
> I am
map(lambda x : x.rsplit('\t',1)).map(lambda x :
> [x[0],getRows(x[1])]).cache()\
> .groupBy(lambda x : x[0].split('\t')[1]).mapValues(lambda x :
> list(x)).cache()
>
> text1.count()
>
> Thanks and Regards,
> Suraj Sheth
>
> On Sun, Apr 10, 2016 at 1:19 AM, Ted Yu <
llecting only TaskEnd Events.
>
> I can do the event wise summation for couple of runs and get back to you.
>
>
>
> Thanks,
>
> Jasmine
>
>
>
> *From:* Ted Yu [mailto:yuzhih...@gmail.com]
> *Sent:* Thursday, April 07, 2016 1:43 PM
> *To:* JasmineGeorge
>
Haven't found any JIRA w.r.t. combineByKey for Dataset.
What's your use case ?
Thanks
On Sat, Apr 9, 2016 at 7:38 PM, Amit Sela wrote:
> Is there (planned ?) a combineByKey support for Dataset ?
> Is / Will there be a support for combiner lifting ?
>
> Thanks,
> Amit
>
Looks like the exception occurred on driver.
Consider increasing the values for the following config:
conf.set("spark.driver.memory", "10240m")
conf.set("spark.driver.maxResultSize", "2g")
Cheers
On Sat, Apr 9, 2016 at 9:02 PM, Buntu Dev wrote:
> I'm running it via
The value was out of the range of integer.
Which Spark release are you using ?
Can you post snippet of code which can reproduce the error ?
Thanks
On Sat, Apr 9, 2016 at 12:25 PM, SURAJ SHETH wrote:
> I am trying to perform some processing and cache and count the RDD.
>
mahesh :
bq. :16: error: not found: value sqlContext
Please take a look at:
https://spark.apache.org/docs/latest/sql-programming-guide.html#starting-point-sqlcontext
for how the import should be used.
Please include version of Spark and the commandline you used in the reply.
roupBy and a count in
> pyspark.sql on a Spark DataFrame.
>
> Any ideas?
>
> Nicolas
>
> On Fri, Apr 8, 2016 at 1:13 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Did you encounter similar error on a smaller dataset ?
>>
>> Which release of Spark are you us
Did you encounter similar error on a smaller dataset ?
Which release of Spark are you using ?
Is it possible you have an incompatible snappy version somewhere in your
classpath ?
Thanks
On Fri, Apr 8, 2016 at 12:36 PM, entee wrote:
> I'm trying to do a relatively large
I searched 1.6.1 code base but didn't find how this can be configured
(within Spark).
On Fri, Apr 8, 2016 at 9:01 AM, nihed mbarek wrote:
> Hi
> How to configure parquet.block.size on Spark 1.6 ?
>
> Thank you
> Nihed MBAREK
>
>
> --
>
> M'BAREK Med Nihed,
> Fedora
Looks like you're using Spark 1.6.x
What error(s) did you get for the first two joins ?
Thanks
On Fri, Apr 8, 2016 at 3:53 AM, JH P wrote:
> Hi. I want a dataset join with itself. So i tried below codes.
>
> 1. newGnsDS.joinWith(newGnsDS, $"dataType”)
>
> 2.
Which Spark release are you using ?
Have you registered to all the events provided by SparkListener ?
If so, can you do event-wise summation of execution time ?
Thanks
On Thu, Apr 7, 2016 at 11:03 AM, JasmineGeorge wrote:
> We are running a batch job with the following
This is the version of Kafka Spark depends on:
[INFO] +- org.apache.kafka:kafka_2.10:jar:0.8.2.1:compile
On Thu, Apr 7, 2016 at 9:14 AM, Haroon Rasheed
wrote:
> Try removing libraryDependencies += "org.apache.kafka" %% "kafka" % "1.6.0"
> compile. I guess the internal
Have you looked at SparkListener ?
/**
* Called when the driver registers a new executor.
*/
def onExecutorAdded(executorAdded: SparkListenerExecutorAdded): Unit
/**
* Called when the driver removes an executor.
*/
def onExecutorRemoved(executorRemoved:
Which hadoop release are you using ?
bq. yarn cluster with 2GB RAM
I assume 2GB is per node. Isn't this too low for your use case ?
Cheers
On Wed, Apr 6, 2016 at 9:19 AM, Peter Rudenko
wrote:
> Hi i have a situation, say i have a yarn cluster with 2GB RAM. I'm
>
The error was due to REPL expecting an integer (index to the Array) whereas
"MAX(count)" was a String.
What do you want to achieve ?
On Tue, Apr 5, 2016 at 4:17 AM, Angel Angel wrote:
> Hello,
>
> i am writing one spark application i which i need the index of the
Did you define idxmax() method yourself ?
Thanks
On Tue, Apr 5, 2016 at 4:17 AM, Angel Angel wrote:
> Hello,
>
> i am writing one spark application i which i need the index of the maximum
> element.
>
> My table has one column only and i want the index of the maximum
bq. I'm on version 2.10 for spark
The above is Scala version.
Can you give us the Spark version ?
Thanks
On Mon, Apr 4, 2016 at 2:36 PM, mpawashe wrote:
> Hi all,
>
> I am using Spark Streaming API (I'm on version 2.10 for spark and
> streaming), and I am running into a
bq. the modifications do not touch the scheduler
If the changes can be ported over to 1.6.1, do you mind reproducing the
issue there ?
I ask because master branch changes very fast. It would be good to narrow
the scope where the behavior you observed started showing.
On Mon, Apr 4, 2016 at 6:12
t;*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 3 April 2016 at 18:05, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Mich:
>> See the following method of DStream:
>>
>>* Print the first num elements of each RDD gen
ot a member of
>> org.apache.spark.streaming.dstream.DStream[(String, Int)]
>> val v = lines.filter(_.contains("ASE 15")).filter(_
>> contains("UPDATE INDEX STATISTICS")).flatMap(line =>
>> line.split("\n,")).map(word => (word, 1)).reduceByKey(_ +
>>
refer to the content of the stream here?
>
> Thanks
>
>
>
>
>
>
>
>
>
>
> //
> // Now want to do some analysis on the same text file
> //
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id
bq. split"\t," splits the filter by carriage return
Minor correction: "\t" denotes tab character.
On Sun, Apr 3, 2016 at 7:24 AM, Eliran Bivas wrote:
> Hi Mich,
>
> 1. The first underscore in your filter call is refering to a line in the
> file (as textFile() results in a
Looking at the implementation for lookup in PairRDDFunctions, I think your
understanding is correct.
On Sat, Apr 2, 2016 at 3:16 AM, Nirav Patel wrote:
> I will start by question: Is spark lookup function on pair rdd is a driver
> action. ie result is returned to driver?
ranshu...@gmail.com>
wrote:
> When I added *"org.apache.spark" % "spark-core_2.10" % "1.6.0", *it
> should include spark-core_2.10-1.6.1-tests.jar.
> Why do I need to use the jar file explicitly?
>
> And how do I use the jars for compiling with *
Thanks for sharing the workaround.
Probably send a PR on tranquilizer github :-)
On Fri, Apr 1, 2016 at 12:50 PM, Marcelo Oikawa wrote:
> Hi, list.
>
> Just to close the thread. Unfortunately, I didnt solve the jackson lib
> problem but I did a workaround that
ot;spark-core_2.10" %
>> "1.6.0", "org.apache.spark" % "spark-mllib_2.10" % "1.6.0" )*
>
>
>
>
> On Sat, Apr 2, 2016 at 2:21 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Assuming your code is written in Scala, I would suggest u
Assuming your code is written in Scala, I would suggest using ScalaTest.
Please take a look at the XXSuite.scala files under mllib/
On Fri, Apr 1, 2016 at 1:31 PM, Shishir Anshuman
wrote:
> Hello,
>
> I have a code written in scala using Mllib. I want to perform unit
so do I have to set them for the history-server? The daemon? The workers?
>
> And what if I use the java API instead of spark-submit for the jobs?
>
> I guess that the spark-defaults.conf are obsolete for the java API?
>
>
> Am 2016-04-01 18:58, schrieb Ted Yu:
>
>&g
bq. This was a big help!
The email (maybe only addressed to you) didn't come with your latest reply.
Do you mind sharing it ?
Thanks
On Fri, Apr 1, 2016 at 11:37 AM, ludflu wrote:
> This was a big help! For the benefit of my fellow travelers running spark
> on
> EMR:
>
> I
You can set them in spark-defaults.conf
See also https://spark.apache.org/docs/latest/configuration.html#spark-ui
On Fri, Apr 1, 2016 at 8:26 AM, Max Schmidt wrote:
> Can somebody tell me the interaction between the properties:
>
> spark.ui.retainedJobs
>
201 - 300 of 1611 matches
Mail list logo