t; But it's really weird to be setting SPARK_HOME in the environment of
> your node managers. YARN shouldn't need to know about that.
> On Thu, Oct 4, 2018 at 10:22 AM Jianshi Huang
> wrote:
> >
> >
> https://github.com/apache/spark/blob/88e7e87bd5c052e10f52d4bb97a9d78f5b524128/c
does not get
> expanded by the shell).
>
> But it's really weird to be setting SPARK_HOME in the environment of
> your node managers. YARN shouldn't need to know about that.
> On Thu, Oct 4, 2018 at 10:22 AM Jianshi Huang
> wrote:
> >
> >
> https://github.com/apache/spark
tting SPARK_HOME in the environment of
>> your node managers. YARN shouldn't need to know about that.
>> On Thu, Oct 4, 2018 at 10:22 AM Jianshi Huang
>> wrote:
>> >
>> >
>> https://github.com/apache/spark/blob/88e7e87bd5c052e10f52d4bb97a9d78f5b524128
m your gateway machine to YARN by
> default.
>
> You probably have some configuration (in spark-defaults.conf) that
> tells YARN to use a cached copy. Get rid of that configuration, and
> you can use whatever version you like.
> On Thu, Oct 4, 2018 at 2:19 AM Jianshi Huang
> wrote:
er-1.cluster-68492:9000/lib/py4j-0.10.7-src.zip']
> sc = pyspark.SparkContext(appName="Jianshi", master="yarn-client",
> conf=sparkConf, pyFiles=py_files)
>
>
Thanks,
--
Jianshi Huang
No one using History server? :)
Am I the only one need to see all user's logs?
Jianshi
On Thu, May 21, 2015 at 1:29 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
I'm using Spark 1.4.0-rc1 and I'm using default settings for history
server.
But I can only see my own logs
directory.
On Wed, May 27, 2015 at 5:33 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
No one using History server? :)
Am I the only one need to see all user's logs?
Jianshi
On Thu, May 21, 2015 at 1:29 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
I'm using Spark 1.4.0-rc1
Hi,
I'm using Spark 1.4.0-rc1 and I'm using default settings for history server.
But I can only see my own logs. Is it possible to view all user's logs? The
permission is fine for the user group.
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http://huangjs.github.com/
))
PhysicalRDD [meta#143,nvar#145,date#147], MapPartitionsRDD[6] at
explain at console:32
Jianshi
On Tue, May 12, 2015 at 10:34 PM, Olivier Girardot ssab...@gmail.com
wrote:
can you post the explain too ?
Le mar. 12 mai 2015 à 12:11, Jianshi Huang jianshi.hu...@gmail.com a
écrit :
Hi,
I
is still open,
when can we have it fixed? :)
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http://huangjs.github.com/
I'm using the default settings.
Jianshi
On Wed, May 6, 2015 at 7:05 PM, twinkle sachdeva twinkle.sachd...@gmail.com
wrote:
Hi,
Can you please share your compression etc settings, which you are using.
Thanks,
Twinkle
On Wed, May 6, 2015 at 4:15 PM, Jianshi Huang jianshi.hu...@gmail.com
, Apr 24, 2015 at 11:00 AM, Yin Huai yh...@databricks.com wrote:
The exception looks like the one mentioned in
https://issues.apache.org/jira/browse/SPARK-4520. What is the version
of Spark?
On Fri, Apr 24, 2015 at 2:40 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
My data looks
Huai yh...@databricks.com wrote:
The exception looks like the one mentioned in
https://issues.apache.org/jira/browse/SPARK-4520. What is the version
of Spark?
On Fri, Apr 24, 2015 at 2:40 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
My data looks like
(MessageColumnIO.java:96)
at
parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:126)
at
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:193)
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http
Hi,
I want to write this in Spark SQL DSL:
select map('c1', c1, 'c2', c2) as m
from table
Is there a way?
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http://huangjs.github.com/
,
friction can be a huge factor in the equations in some other it is just
part of the landscape
*From:* Gerard Maas [mailto:gerard.m...@gmail.com]
*Sent:* Friday, April 17, 2015 10:12 AM
*To:* Evo Eftimov
*Cc:* Tathagata Das; Jianshi Huang; user; Shao, Saisai; Huang Jie
*Subject:* Re: How
- multiple DStreams)
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http://huangjs.github.com/
Hi,
Anyone has similar request?
https://issues.apache.org/jira/browse/SPARK-6561
When we save a DataFrame into Parquet files, we also want to have it
partitioned.
The proposed API looks like this:
def saveAsParquet(path: String, partitionColumns: Seq[String])
--
Jianshi Huang
LinkedIn
I created a JIRA: https://issues.apache.org/jira/browse/SPARK-6353
On Mon, Mar 16, 2015 at 5:36 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
We're facing No space left on device errors lately from time to time.
The job will fail after retries. Obvious in such case, retry won't
spark.scheduler.executorTaskBlacklistTime to 3 to solve such No
space left on device errors. So if a task runs unsuccessfully in some
executor, it won't be scheduled to the same executor in 30 seconds.
Best Regards,
Shixiong Zhu
2015-03-16 17:40 GMT+08:00 Jianshi Huang jianshi.hu...@gmail.com:
I created a JIRA
Oh, by default it's set to 0L.
I'll try setting it to 3 immediately. Thanks for the help!
Jianshi
On Mon, Mar 16, 2015 at 11:32 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Thanks Shixiong!
Very strange that our tasks were retried on the same executor again and
again. I'll check
Forget about my last message. I was confused. Spark 1.2.1 + Scala 2.10.4
started by SBT console command also failed with this error. However running
from a standard spark shell works.
Jianshi
On Fri, Mar 13, 2015 at 2:46 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hmm... look like
Hmm... look like the console command still starts a Spark 1.3.0 with Scala
2.11.6 even I changed them in build.sbt.
So the test with 1.2.1 is not valid.
Jianshi
On Fri, Mar 13, 2015 at 2:34 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
I've confirmed it only failed in console started
[info] try { f } finally { println(Elapsed: + (now - start)/1000.0 +
s) }
[info] }
[info]
[info] @transient val sqlc = new org.apache.spark.sql.SQLContext(sc)
[info] implicit def sqlContext = sqlc
[info] import sqlc._
Jianshi
On Fri, Mar 13, 2015 at 3:10 AM, Jianshi Huang jianshi.hu...@gmail.com
Liancheng also found out that the Spark jars are not included in the
classpath of URLClassLoader.
Hmm... we're very close to the truth now.
Jianshi
On Fri, Mar 13, 2015 at 6:03 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
I'm almost certain the problem is the ClassLoader.
So adding
.
Thanks
Ashish
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http://huangjs.github.com/
I'm almost certain the problem is the ClassLoader.
So adding
fork := true
solves problems for test and run.
The problem is how can I fork a JVM for sbt console? fork in console :=
true seems not working...
Jianshi
On Fri, Mar 13, 2015 at 4:35 PM, Jianshi Huang jianshi.hu...@gmail.com
:23 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Same issue here. But the classloader in my exception is somehow different.
scala.ScalaReflectionException: class
org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror with
java.net.URLClassLoader@53298398 of type class
spark.version1.2.1/spark.version
scala.version2.11.5/scala.version
Please let me know how can I resolve this problem.
Thanks
Ashish
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http://huangjs.github.com/
:
We don't support expressions or wildcards in that configuration. For
each application, the local directories need to be constant. If you
have users submitting different Spark applications, those can each set
spark.local.dirs.
- Patrick
On Wed, Mar 11, 2015 at 12:14 AM, Jianshi Huang
directories either. Typically, like in YARN, you would a number of
directories (on different disks) mounted and configured for local
storage for jobs.
On Wed, Mar 11, 2015 at 7:42 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Unfortunately /tmp mount is really small in our environment. I
Hi,
I need to set per-user spark.local.dir, how can I do that?
I tried both
/x/home/${user.name}/spark/tmp
and
/x/home/${USER}/spark/tmp
And neither worked. Looks like it has to be a constant setting in
spark-defaults.conf. Right?
Any ideas how to do that?
Thanks,
--
Jianshi Huang
:01 PM, Shao, Saisai saisai.s...@intel.com wrote:
I think there’s a lot of JIRA trying to solve this problem (
https://issues.apache.org/jira/browse/SPARK-5763). Basically sort merge
join is a good choice.
Thanks
Jerry
*From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com]
*Sent
One really interesting is that when I'm using the
netty-based spark.shuffle.blockTransferService, there's no OOM error
messages (java.lang.OutOfMemoryError: Java heap space).
Any idea why it's not here?
I'm using Spark 1.2.1.
Jianshi
On Thu, Mar 5, 2015 at 1:56 PM, Jianshi Huang jianshi.hu
, Jianshi Huang jianshi.hu...@gmail.com
wrote:
I see. I'm using core's join. The data might have some skewness
(checking).
I understand shuffle can spill data to disk but when consuming it, say in
cogroup or groupByKey, it still needs to read the whole group elements,
right? I guess OOM happened
is skewed or key number is
smaller, so you will meet OOM.
Maybe you could monitor each stage or task’s shuffle and GC status also
system status to identify the problem.
Thanks
Jerry
*From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com]
*Sent:* Thursday, March 5, 2015 2:32 PM
*To:* Aaron
the shuffle related operations can spill the
data into disk and no need to read the whole partition into memory. But if
you uses SparkSQL, it depends on how SparkSQL uses this operators.
CC @hao if he has some thoughts on it.
Thanks
Jerry
*From:* Jianshi Huang [mailto:jianshi.hu
, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hmm... ok, previous errors are still block fetch errors.
15/03/03 10:22:40 ERROR RetryingBlockFetcher: Exception while beginning
fetch of 11 outstanding blocks
java.io.IOException: Failed to connect to host-/:55597
-SNAPSHOT I built around Dec. 20. Is there any
bug fixes related to shuffle block fetching or index files after that?
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http://huangjs.github.com/
)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
Jianshi
On Wed, Mar 4, 2015 at 2:55 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
I got this error
Davidson ilike...@gmail.com wrote:
Drat! That doesn't help. Could you scan from the top to see if there were
any fatal errors preceding these? Sometimes a OOM will cause this type of
issue further down.
On Tue, Mar 3, 2015 at 8:16 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
The failed
check its logs as well.
On Tue, Mar 3, 2015 at 11:03 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Sorry that I forgot the subject.
And in the driver, I got many FetchFailedException. The error messages are
15/03/03 10:34:32 WARN TaskSetManager: Lost task 31.0 in stage 2.2 (TID
7943
: https://issues.apache.org/jira/browse/SPARK-5828
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http://huangjs.github.com/
:
I think we made the binary protocol compatible across all versions, so you
should be fine with using any one of them. 1.2.1 is probably the best since
it is the most recent stable release.
On Tue, Feb 10, 2015 at 8:43 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
I need to use
, 1.3.0)
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http://huangjs.github.com/
Hi,
Anyone has implemented the default Pig Loader in Spark? (loading delimited
text files with .pig_schema)
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http://huangjs.github.com/
)
at
org.apache.spark.sql.catalyst.plans.logical.Aggregate$$anonfun$output$6.apply(basicOperators.scala:143)
I'm using latest branch-1.2
I found in PR that percentile and percentile_approx are supported. A bug?
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http://huangjs.github.com/
SimpleGenericUDAFParameterInfo(inspectors.toArray, false, false)
resolver.getEvaluator(parameterInfo)
FYI
On Tue, Jan 13, 2015 at 1:51 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
The following SQL query
select percentile_approx(variables.var1, 0.95) p95
from model
will throw
ERROR
FYI,
Latest hive 0.14/parquet will have column renaming support.
Jianshi
On Wed, Dec 10, 2014 at 3:37 AM, Michael Armbrust mich...@databricks.com
wrote:
You might also try out the recently added support for views.
On Mon, Dec 8, 2014 at 9:31 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote
, 2014 at 8:28 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Ok, found another possible bug in Hive.
My current solution is to use ALTER TABLE CHANGE to rename the column
names.
The problem is after renaming the column names, the value of the columns
became all NULL.
Before renaming
Very interesting, the line doing drop table will throws an exception. After
removing it all works.
Jianshi
On Sat, Dec 6, 2014 at 9:11 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Here's the solution I got after talking with Liancheng:
1) using backquote `..` to wrap up all illegal
Hmm... another issue I found doing this approach is that ANALYZE TABLE ...
COMPUTE STATISTICS will fail to attach the metadata to the table, and later
broadcast join and such will fail...
Any idea how to fix this issue?
Jianshi
On Sat, Dec 6, 2014 at 9:10 PM, Jianshi Huang jianshi.hu
sql(select cre_ts from pmt limit 1).collect
res16: Array[org.apache.spark.sql.Row] = Array([null])
I created a JIRA for it:
https://issues.apache.org/jira/browse/SPARK-4781
Jianshi
On Sun, Dec 7, 2014 at 1:06 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hmm... another issue I found
Hmm..
I've created a JIRA: https://issues.apache.org/jira/browse/SPARK-4782
Jianshi
On Sun, Dec 7, 2014 at 2:32 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
What's the best way to convert RDD[Map[String, Any]] to a SchemaRDD?
I'm currently converting each Map to a JSON String
fine for me on master. Note that Hive does print an
exception in the logs, but that exception does not propogate to user code.
On Thu, Dec 4, 2014 at 11:31 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
I got exception saying Hive: NoSuchObjectException(message:table table
table pmt (
sorted::id bigint
)
stored as parquet
location '...'
Obviously it didn't work, I also tried removing the identifier sorted::,
but the resulting rows contain only nulls.
Any idea how to create a table in HiveContext from these Parquet files?
Thanks,
Jianshi
--
Jianshi Huang
(t.schema.fields.map(s = s.copy(name =
s.name.replaceAll(.*?::,
sql(sdrop table $name)
applySchema(t, newSchema).registerTempTable(name)
I'm testing it for now.
Thanks for the help!
Jianshi
On Sat, Dec 6, 2014 at 8:41 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
I had
using latest Spark built from master HEAD yesterday. Is this a bug?
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http://huangjs.github.com/
?
Jianshi
On Fri, Dec 5, 2014 at 11:37 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
I got the following error during Spark startup (Yarn-client mode):
14/12/04 19:33:58 INFO Client: Uploading resource
file:/x/home/jianshuang/spark/spark-latest/lib/datanucleus-api-jdo-3.2.6.jar
-
hdfs
Actually my HADOOP_CLASSPATH has already been set to include
/etc/hadoop/conf/*
export
HADOOP_CLASSPATH=/etc/hbase/conf/hbase-site.xml:/usr/lib/hbase/lib/hbase-protocol.jar:$(hbase
classpath)
Jianshi
On Fri, Dec 5, 2014 at 11:54 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Looks like
Looks like the datanucleus*.jar shouldn't appear in the hdfs path in
Yarn-client mode.
Maybe this patch broke yarn-client.
https://github.com/apache/spark/commit/a975dc32799bb8a14f9e1c76defaaa7cfbaf8b53
Jianshi
On Fri, Dec 5, 2014 at 12:02 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote
Correction:
According to Liancheng, this hotfix might be the root cause:
https://github.com/apache/spark/commit/38cb2c3a36a5c9ead4494cbc3dde008c2f0698ce
Jianshi
On Fri, Dec 5, 2014 at 12:45 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Looks like the datanucleus*.jar shouldn't appear
-most
among the inner joins;
DESC EXTENDED tablename; -- this will print the detail information for the
statistic table size (the field “totalSize”)
EXPLAIN EXTENDED query; -- this will print the detail physical plan.
Let me know if you still have problem.
Hao
*From:* Jianshi Huang
, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Sorry for the late of follow-up.
I used Hao's DESC EXTENDED command and found some clue:
new (broadcast broken Spark build):
parameters:{numFiles=0, EXTERNAL=TRUE, transient_lastDdlTime=1417763892,
COLUMN_STATS_ACCURATE=false, totalSize=0
With Liancheng's suggestion, I've tried setting
spark.sql.hive.convertMetastoreParquet false
but still analyze noscan return -1 in rawDataSize
Jianshi
On Fri, Dec 5, 2014 at 3:33 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
If I run ANALYZE without NOSCAN, then Hive can successfully
/3270 should be
another optimization for this.
*From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com]
*Sent:* Wednesday, November 26, 2014 4:36 PM
*To:* user
*Subject:* Auto BroadcastJoin optimization failed in latest Spark
Hi,
I've confirmed that the latest Spark with either Hive
similar situation?
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http://huangjs.github.com/
)
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
Using the same DDL and Analyze script above.
Jianshi
On Sat, Oct 11, 2014 at 2:18 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
It works fine, thanks for the help Michael.
Liancheng
/usr/lib/hive/lib doesn’t show any of the parquet
jars, but ls /usr/lib/impala/lib shows the jar we’re looking for as
parquet-hive-1.0.jar
Is it removed from latest Spark?
Jianshi
On Wed, Nov 26, 2014 at 2:13 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
Looks like the latest SparkSQL
Hi,
I got an error during rdd.registerTempTable(...) saying scala.MatchError:
scala.BigInt
Looks like BigInt cannot be used in SchemaRDD, is that correct?
So what would you recommend to deal with it?
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http
@gmail.com wrote:
Hello Jianshi,
The reason of that error is that we do not have a Spark SQL data type for
Scala BigInt. You can use Decimal for your case.
Thanks,
Yin
On Fri, Nov 21, 2014 at 5:11 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
I got an error during
the build instructions here :
https://github.com/ScrapCodes/spark-1/blob/patch-3/docs/building-spark.md
Prashant Sharma
On Tue, Nov 18, 2014 at 12:19 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Any notable issues for using Scala 2.11? Is it stable now?
Or can I use Scala 2.11 in my
, Nov 14, 2014 at 2:49 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Ok, then we need another trick.
let's have an *implicit lazy var connection/context* around our code.
And setup() will trigger the eval and initialization.
Due to lazy evaluation, I think having setup/teardown is a bit
Any notable issues for using Scala 2.11? Is it stable now?
Or can I use Scala 2.11 in my spark application and use Spark dist build
with 2.10 ?
I'm looking forward to migrate to 2.11 for some quasiquote features.
Couldn't make it run in 2.10...
Cheers,
--
Jianshi Huang
LinkedIn: jianshi
: scala.r
eflect.internal.MissingRequirementError: object scala.runtime in compiler
mirror not found. - [Help 1]
Anyone knows what's the problem?
I'm building it on OSX. I didn't had this problem one month ago.
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http
-mapreduce-to-apache-spark/
On 11/14/14 10:44 AM, Dai, Kevin wrote:
HI, all
Is there setup and cleanup function as in hadoop mapreduce in spark which
does some initialization and cleanup work?
Best Regards,
Kevin.
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github
@gmail.com wrote:
If you’re just relying on the side effect of setup() and cleanup() then
I think this trick is OK and pretty cleaner.
But if setup() returns, say, a DB connection, then the map(...) part and
cleanup() can’t get the connection object.
On 11/14/14 1:20 PM, Jianshi Huang wrote:
So
needs to be
collect to driver, is there a way to avoid doing this?
Thanks
Jianshi
On Mon, Oct 27, 2014 at 4:57 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Sure, let's still focus on the streaming simulation use case. It's a very
useful problem to solve.
If we're going to use the same
-version suffixes in:
org.scalamacros:quasiq
uotes
On Thu, Oct 30, 2014 at 9:50 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi Preshant, Chester, Mohammed,
I switched to Spark's Akka and now it works well. Thanks for the help!
(Need to exclude Akka from Spray dependencies, or specify
that.
Can you try a Spray version built with 2.2.x along with Spark 1.1 and
include the Akka dependencies in your project’s sbt file?
Mohammed
*From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com]
*Sent:* Tuesday, October 28, 2014 8:58 PM
*To:* Mohammed Guller
*Cc:* user
*Subject:* Re
has idea what went wrong? Need help!
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http://huangjs.github.com/
akka.version2.3.4-spark/akka.version
it should solve problem. Makes sense? I'll give it a shot when I have time,
now probably I'll just not using Spray client...
Cheers,
Jianshi
On Tue, Oct 28, 2014 at 6:02 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
I got the following
I'm using Spark built from HEAD, I think it uses modified Akka 2.3.4, right?
Jianshi
On Wed, Oct 29, 2014 at 5:53 AM, Mohammed Guller moham...@glassbeam.com
wrote:
Try a version built with Akka 2.2.x
Mohammed
*From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com]
*Sent:* Tuesday
?
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http://huangjs.github.com/
at
it.
For your case, I think TD’s comment are quite meaningful, it’s not trivial
to do so, often requires a job to scan all the records, it’s also not the
design purpose of Spark Streaming, I guess it’s hard to achieve what you
want.
Thanks
Jerry
*From:* Jianshi Huang [mailto:jianshi.hu
to arrange data, but
you cannot avoid scanning the whole data. Basically we need to avoid
fetching large amount of data back to driver.
Thanks
Jerry
*From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com]
*Sent:* Monday, October 27, 2014 2:39 PM
*To:* Shao, Saisai
*Cc:* user
PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
You're absolutely right, it's not 'scalable' as I'm using collect().
However, it's important to have the RDDs ordered by the timestamp of the
time window (groupBy puts data to corresponding timewindow).
It's fairly easy to do in Pig, but somehow
nested RDD in closure.
Thanks
Jerry
*From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com]
*Sent:* Monday, October 27, 2014 3:30 PM
*To:* Shao, Saisai
*Cc:* user@spark.apache.org; Tathagata Das (t...@databricks.com)
*Subject:* Re: RDD to DStream
Ok, back to Scala code, I'm wondering
HiveContext is a subclass, we should make the same
semantics as default. Make sense?
Spark is very much functional and shared nothing, these are wonderful
features. Let's not have something global as a dependency.
Cheers,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http
On Mon, Oct 27, 2014 at 4:44 PM, Shao, Saisai saisai.s...@intel.com wrote:
Yes, I understand what you want, but maybe hard to achieve without
collecting back to driver node.
Besides, can we just think of another way to do it.
Thanks
Jerry
*From:* Jianshi Huang
Any suggestion? :)
Jianshi
On Thu, Oct 23, 2014 at 3:49 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
The Kafka stream has 10 topics and the data rate is quite high (~ 100K/s
per topic).
Which configuration do you recommend?
- 1 Spark app consuming all Kafka topics
- 10 separate Spark
can use an in-memory Derby
database as metastore
https://db.apache.org/derby/docs/10.7/devguide/cdevdvlpinmemdb.html
I'll investigate this when free, guess we can use this for Spark SQL Hive
support testing.
On 10/27/14 4:38 PM, Jianshi Huang wrote:
There's an annoying small usability
)
}
}
}
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http://huangjs.github.com/
program from time to time.
Is there a mechanism that Spark stream can load and plugin code in runtime
without restarting?
Any solutions or suggestions?
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http://huangjs.github.com/
The Kafka stream has 10 topics and the data rate is quite high (~ 100K/s
per topic).
Which configuration do you recommend?
- 1 Spark app consuming all Kafka topics
- 10 separate Spark app each consuming one topic
Assuming they have the same resource pool.
Cheers,
--
Jianshi Huang
LinkedIn
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http://huangjs.github.com/
dim tables (using
HiveContext) and then map it to my class object. It failed a couple of
times and now I cached the intermediate table and currently it seems
working fine... no idea why until I found SPARK-3106
Cheers,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http
Hmm... it failed again, just lasted a little bit longer.
Jianshi
On Mon, Oct 13, 2014 at 4:15 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
https://issues.apache.org/jira/browse/SPARK-3106
I'm having the saming errors described in SPARK-3106 (no other types of
errors confirmed), running
Turned out it was caused by this issue:
https://issues.apache.org/jira/browse/SPARK-3923
Set spark.akka.heartbeat.interval to 100 solved it.
Jianshi
On Mon, Oct 13, 2014 at 4:24 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hmm... it failed again, just lasted a little bit longer.
Jianshi
On Tue, Oct 14, 2014 at 4:36 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Turned out it was caused by this issue:
https://issues.apache.org/jira/browse/SPARK-3923
Set spark.akka.heartbeat.interval to 100 solved it.
Jianshi
On Mon, Oct 13, 2014 at 4:24 PM, Jianshi Huang jianshi.hu
1 - 100 of 135 matches
Mail list logo