SparkSQL was built to improve upon Hive on Spark runtime further...
On Tue, May 19, 2015 at 10:37 PM, guoqing0...@yahoo.com.hk
guoqing0...@yahoo.com.hk wrote:
Hive on Spark and SparkSQL which should be better , and what are the key
characteristics and the advantages and the disadvantages
Hi,
For indexedrowmatrix and rowmatrix, both take RDD(vector)is it possible
that it has intermixed dense and sparse vectorbasically I am
considering a gemv flow when indexedrowmatrix has dense flag true, dot flow
otherwise...
Thanks.
Deb
The batch version of this is part of rowSimilarities JIRA 4823 ...if your
query points can fit in memory there is broadcast version which we are
experimenting with internallywe are using brute force KNN right now in
the PR...based on flann paper lsh did not work well but before you go to
[
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551294#comment-14551294
]
Debasish Das commented on SPARK-6323:
-
Petuum paper that got released today mentioned
[
https://issues.apache.org/jira/browse/SPARK-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547318#comment-14547318
]
Debasish Das commented on SPARK-4823:
-
I opened up a PR that worked well for our
I opened it up today but it should help you:
https://github.com/apache/spark/pull/6213
On Sat, May 16, 2015 at 6:18 PM, Chunnan Yao yaochun...@gmail.com wrote:
Hi all,
Recently I've ran into a scenario to conduct two sample tests between all
paired combination of columns of an RDD. But the
[
https://issues.apache.org/jira/browse/SPARK-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Debasish Das updated SPARK-4231:
Affects Version/s: (was: 1.2.0)
1.4.0
Add RankingMetrics
[
https://issues.apache.org/jira/browse/SPARK-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Debasish Das reopened SPARK-4231:
-
The code was not part of SPARK-3066 and so reopening...
Add RankingMetrics to examples.MovieLensALS
Cross Join shuffle space might not be needed since most likely through
application specific logic (topK etc) you can cut the shuffle space...Also
most likely the brute force approach will be a benchmark tool to see how
better is your clustering based KNN solution since there are several ways
you
[
https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512843#comment-14512843
]
Debasish Das commented on SPARK-5992:
-
Did someone compared algebird LSH with spark
If there is L1 from DB's OWLQN development, why do we need dropout
regularization ?
On Wed, Apr 15, 2015 at 8:59 PM, rakeshchalasani g...@git.apache.org wrote:
GitHub user rakeshchalasani opened a pull request:
https://github.com/apache/spark/pull/5539
Add dropout regularization to
, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
I am benchmarking row vs col similarity flow on 60M x 10M matrices...
Details are in this JIRA:
https://issues.apache.org/jira/browse/SPARK-4823
For testing I am using Netflix data since the structure is very similar:
50k x 17K near dense
Hi,
I have some code that creates ~ 80 RDD and then a sc.union is applied to
combine all 80 into one for the next step (to run topByKey for example)...
While creating 80 RDDs take 3 mins per RDD, doing a union over them takes 3
hrs (I am validating these numbers)...
Is there any checkpoint
I have a version that works well for Netflix data but now I am validating
on internal datasets..this code will work on matrix factors and sparse
matrices that has rows = 100* columnsif columns are much smaller than
rows then col based flow works well...basically we need both flows...
I did
[
https://issues.apache.org/jira/browse/SPARK-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484646#comment-14484646
]
Debasish Das commented on SPARK-3987:
-
@mengxr for this testcase it was fixed but I
[
https://issues.apache.org/jira/browse/SPARK-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484646#comment-14484646
]
Debasish Das edited comment on SPARK-3987 at 4/8/15 3:31 AM
sorted list by using a priority queue and dequeuing top N
values.
In the end, I get a record for each segment with N max values for each
segment.
Regards,
Aung
On Fri, Mar 27, 2015 at 4:27 PM, Debasish Das debasish.da...@gmail.com
wrote:
In that case you can directly use count-min
[
https://issues.apache.org/jira/browse/SPARK-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388128#comment-14388128
]
Debasish Das commented on SPARK-5564:
-
[~sparks] we are trying to access the EC2
[
https://issues.apache.org/jira/browse/SPARK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389973#comment-14389973
]
Debasish Das edited comment on SPARK-3066 at 4/1/15 4:28 AM
[
https://issues.apache.org/jira/browse/SPARK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389973#comment-14389973
]
Debasish Das commented on SPARK-3066:
-
Also unless the raw flow runs there is no way
[
https://issues.apache.org/jira/browse/SPARK-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387180#comment-14387180
]
Debasish Das commented on SPARK-5564:
-
Cool...I will run my experiments on the same
[
https://issues.apache.org/jira/browse/SPARK-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387180#comment-14387180
]
Debasish Das edited comment on SPARK-5564 at 3/30/15 6:52 PM
as I see the result. I am not sure if it is
supported by public packages like graphlab or scikit but the plsa papers
show interesting results.
On Mar 30, 2015 2:31 PM, Xiangrui Meng men...@gmail.com wrote:
On Wed, Mar 25, 2015 at 7:59 AM, Debasish Das debasish.da...@gmail.com
wrote:
Hi
[
https://issues.apache.org/jira/browse/SPARK-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386049#comment-14386049
]
Debasish Das commented on SPARK-5564:
-
[~josephkb] could you please point me
[
https://issues.apache.org/jira/browse/SPARK-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386049#comment-14386049
]
Debasish Das edited comment on SPARK-5564 at 3/30/15 12:31 AM
[
https://issues.apache.org/jira/browse/SPARK-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386049#comment-14386049
]
Debasish Das edited comment on SPARK-5564 at 3/30/15 12:30 AM
[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Debasish Das updated SPARK-2426:
Affects Version/s: (was: 1.3.0)
1.4.0
Quadratic Minimization for MLlib
You can do it in-memory as wellget 10% topK elements from each
partition and use merge from any sort algorithm like timsortbasically
aggregateBy
Your version uses shuffle but this version is 0 shuffle..assuming your data
set is cached you will be using in-memory allReduce through
for your suggestions. In-memory version is quite useful. I do not
quite understand how you can use aggregateBy to get 10% top K elements. Can
you please give an example?
Thanks,
Aung
On Fri, Mar 27, 2015 at 2:40 PM, Debasish Das debasish.da...@gmail.com
wrote:
You can do it in-memory as well
is that ALM will support MAP
(and may be KL divergence loss) with sparsity constraints (probability
simplex and bounds are fine for what I am focused at right now)...
Thanks.
Deb
On Tue, Feb 17, 2015 at 4:40 PM, Debasish Das debasish.da...@gmail.com
wrote:
There is a usability difference...I am not sure
Hi,
Right now LogisticGradient implements both binary and multi-class in the
same class using an if-else statement which is a bit convoluted.
For Generalized matrix factorization, if the data has distinct ratings I
want to use LeastSquareGradient (regression has given best results to date)
but
multiclass logistic loss/gradient. If it's not a big hit, then
it
might be simpler from an outside API perspective to keep them in 1 class
(even if it's more complicated within).
Joseph
On Wed, Mar 25, 2015 at 8:15 AM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
Right now
[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377357#comment-14377357
]
Debasish Das edited comment on SPARK-2426 at 3/24/15 3:23 PM
[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377357#comment-14377357
]
Debasish Das edited comment on SPARK-2426 at 3/24/15 3:23 PM
[
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378062#comment-14378062
]
Debasish Das commented on SPARK-6323:
-
I did some more reading and realized that even
[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377357#comment-14377357
]
Debasish Das commented on SPARK-2426:
-
[~acopich] From your comment before Anyway, l2
[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377357#comment-14377357
]
Debasish Das edited comment on SPARK-2426 at 3/24/15 6:11 AM
[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377357#comment-14377357
]
Debasish Das edited comment on SPARK-2426 at 3/24/15 6:11 AM
[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377357#comment-14377357
]
Debasish Das edited comment on SPARK-2426 at 3/24/15 6:13 AM
[
https://issues.apache.org/jira/browse/SPARK-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376046#comment-14376046
]
Debasish Das commented on SPARK-3735:
-
We might want to consider doing some
[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375325#comment-14375325
]
Debasish Das commented on SPARK-2426:
-
[~acopich] There's a completely different loss
[
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Debasish Das updated SPARK-6323:
Description:
Currently ml.recommendation.ALS is optimized for gram matrix generation which
scales
to track this here: SPARK-6442
https://issues.apache.org/jira/browse/SPARK-6442
The design doc is here: http://goo.gl/sf5LCE
We would very much appreciate your feedback and input.
Best,
Burak
On Thu, Mar 19, 2015 at 3:06 PM, Debasish Das debasish.da...@gmail.com
wrote:
Yeah
There is also a batch prediction API in PR
https://github.com/apache/spark/pull/3098
Idea here is what Sean said...don't try to reconstruct the whole matrix
which will be dense but pick a set of users and calculate topk
recommendations for them using dense level 3 blas.we are going to merge
Hi David,
We are stress testing breeze.optimize.proximal and nnls...if you are
cutting a release now, we will need another release soon once we get the
runtime optimizations in place and merged to breeze.
Thanks.
Deb
On Mar 15, 2015 9:39 PM, David Hall david.lw.h...@gmail.com wrote:
snapshot
[
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360956#comment-14360956
]
Debasish Das edited comment on SPARK-6323 at 3/16/15 6:30 PM
[
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360956#comment-14360956
]
Debasish Das edited comment on SPARK-6323 at 3/15/15 4:29 PM
[
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360956#comment-14360956
]
Debasish Das edited comment on SPARK-6323 at 3/15/15 4:26 PM
[
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361981#comment-14361981
]
Debasish Das commented on SPARK-6323:
-
By the way I can close the JIRA
[
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360956#comment-14360956
]
Debasish Das commented on SPARK-6323:
-
g(z) is not regularization...we support
[
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361005#comment-14361005
]
Debasish Das edited comment on SPARK-6323 at 3/13/15 7:48 PM
[
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361005#comment-14361005
]
Debasish Das commented on SPARK-6323:
-
There are some other interesting cases
[
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Debasish Das updated SPARK-6323:
Description:
Currently ml.recommendation.ALS is optimized for gram matrix generation which
only
Debasish Das created SPARK-6323:
---
Summary: Large rank matrix factorization with Nonlinear loss and
constraints
Key: SPARK-6323
URL: https://issues.apache.org/jira/browse/SPARK-6323
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Debasish Das updated SPARK-6323:
Description:
Currently ml.recommendation.ALS is optimized for gram matrix generation which
only
[
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Debasish Das updated SPARK-6323:
Description:
Currently ml.recommendation.ALS is optimized for gram matrix generation which
scales
[
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Debasish Das updated SPARK-6323:
Description:
Currently ml.recommendation.ALS is optimized for gram matrix generation which
only
[
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Debasish Das updated SPARK-6323:
Description:
Currently ml.recommendation.ALS is optimized for gram matrix generation which
only
[
https://issues.apache.org/jira/browse/SPARK-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Debasish Das resolved SPARK-4231.
-
Resolution: Duplicate
Add RankingMetrics to examples.MovieLensALS
[
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Debasish Das updated SPARK-6323:
Description:
Currently ml.recommendation.ALS is optimized for gram matrix generation which
only
[
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Debasish Das updated SPARK-6323:
Description:
Currently ml.recommendation.ALS is optimized for gram matrix generation which
only
[
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Debasish Das updated SPARK-6323:
Description:
Currently ml.recommendation.ALS is optimized for gram matrix generation which
scales
[
https://issues.apache.org/jira/browse/SPARK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359892#comment-14359892
]
Debasish Das commented on SPARK-3066:
-
We use the non-level 3 BLAS code in our
[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351839#comment-14351839
]
Debasish Das commented on SPARK-2426:
-
[~mengxr] NNLS and QuadraticMinimizer are both
[
https://issues.apache.org/jira/browse/SPARK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351945#comment-14351945
]
Debasish Das commented on SPARK-3066:
-
[~josephkb] do you mean knn
[
https://issues.apache.org/jira/browse/SPARK-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351948#comment-14351948
]
Debasish Das commented on SPARK-4823:
-
[~mengxr] I need level 3 BLAS for this JIRA
[
https://issues.apache.org/jira/browse/SPARK-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351948#comment-14351948
]
Debasish Das edited comment on SPARK-4823 at 3/8/15 6:42 AM
Column based similarities work well if the columns are mild (10K, 100K, we
actually scaled it to 1.5M columns but it really stress tests the shuffle
and it needs to tune the shuffle parameters)...You can either use dimsum
sampling or come up with your own threshold based on your application that
[
https://issues.apache.org/jira/browse/SPARK-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342311#comment-14342311
]
Debasish Das edited comment on SPARK-5564 at 3/1/15 4:41 PM
[
https://issues.apache.org/jira/browse/SPARK-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342311#comment-14342311
]
Debasish Das edited comment on SPARK-5564 at 3/1/15 4:51 PM
[
https://issues.apache.org/jira/browse/SPARK-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342311#comment-14342311
]
Debasish Das commented on SPARK-5564:
-
I am right now using the following PR to do
[
https://issues.apache.org/jira/browse/SPARK-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342311#comment-14342311
]
Debasish Das edited comment on SPARK-5564 at 3/1/15 4:20 PM
[
https://issues.apache.org/jira/browse/SPARK-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342311#comment-14342311
]
Debasish Das edited comment on SPARK-5564 at 3/1/15 4:19 PM
[
https://issues.apache.org/jira/browse/SPARK-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342312#comment-14342312
]
Debasish Das commented on SPARK-5564:
-
By the way the following step
Any reason why the regularization path cannot be implemented using current
owlqn pr ?
We can change owlqn in breeze to fit your needs...
On Feb 24, 2015 3:27 PM, Joseph Bradley jos...@databricks.com wrote:
Hi Mike,
I'm not aware of a standard big dataset, but there are a number
available:
to use DIMSUM. Try to increase the threshold and see
whether it helps. -Xiangrui
On Tue, Feb 17, 2015 at 6:28 AM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
I am running brute force similarity from RowMatrix on a job with 5M x
1.5M
sparse matrix with 800M entries. With 200M
with 1.5m columns, because the output can potentially have 2.25 x
10^12 entries, which is a lot. (squares 1.5m)
Best,
Reza
On Wed, Feb 25, 2015 at 10:13 AM, Debasish Das debasish.da...@gmail.com
wrote:
Is the threshold valid only for tall skinny matrices ? Mine is 6 m x 1.5
m and I made
that the key would be filtered.
And then after, run a flatMap or something to make Option[B] into B.
On Thu, Feb 19, 2015 at 2:21 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
Before I send out the keys for network shuffle, in reduceByKey after map
+
combine are done, I would like
Hi,
Before I send out the keys for network shuffle, in reduceByKey after map +
combine are done, I would like to filter the keys based on some threshold...
Is there a way to get the key, value after map+combine stages so that I can
run a filter on the keys ?
Thanks.
Deb
partitions and apply your
filtering. Then you can finish with a reduceByKey.
On Thu, Feb 19, 2015 at 9:21 AM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
Before I send out the keys for network shuffle, in reduceByKey after map
+ combine are done, I would like to filter the keys based
Hi,
Some of my jobs failed due to no space left on device and on those jobs I
was monitoring the shuffle space...when the job failed shuffle space did
not clean and I had to manually clean it...
Is there a JIRA already tracking this issue ? If no one has been assigned
to it, I can take a look.
by GC pause. Did you check the GC time in the Spark
UI? -Xiangrui
On Sun, Feb 15, 2015 at 8:10 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
I am sometimes getting WARN from running Similarity calculation:
15/02/15 23:07:55 WARN BlockManagerMasterActor: Removing BlockManager
another
pass on your PR today. -Xiangrui
On Tue, Feb 10, 2015 at 8:01 AM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
Will it be possible to merge this PR to 1.3 ?
https://github.com/apache/spark/pull/3098
The batch prediction API in ALS will be useful for us who want
. For a general matrix factorization package, let's
make a JIRA and move our discussion there. -Xiangrui
On Fri, Feb 13, 2015 at 7:46 AM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
I am bit confused on the mllib design in the master. I thought that core
algorithms will stay
Hi,
I am running brute force similarity from RowMatrix on a job with 5M x 1.5M
sparse matrix with 800M entries. With 200M entries the job run fine but
with 800M I am getting exceptions like too many files open and no space
left on device...
Seems like I need more nodes or use dimsum sampling ?
Hi,
I am sometimes getting WARN from running Similarity calculation:
15/02/15 23:07:55 WARN BlockManagerMasterActor: Removing BlockManager
BlockManagerId(7, abc.com, 48419, 0) with no recent heart beats: 66435ms
exceeds 45000ms
Do I need to increase the default 45 s to larger values for cases
...
Neither play nor spray is being used in Spark right nowso it brings
dependencies and we already know about the akka conflicts...thriftserver on
the other hand is already integrated for JDBC access
On Tue, Feb 10, 2015 at 3:43 PM, Debasish Das debasish.da...@gmail.com
wrote:
Also I wanted
Hi,
Will it be possible to merge this PR to 1.3 ?
https://github.com/apache/spark/pull/3098
The batch prediction API in ALS will be useful for us who want to cross
validate on prec@k and MAP...
Thanks.
Deb
Hi Michael,
I want to cache a RDD and define get() and set() operators on it. Basically
like memcached. Is it possible to build a memcached like distributed cache
using Spark SQL ? If not what do you suggest we should use for such
operations...
Thanks.
Deb
On Fri, Jul 18, 2014 at 1:00 PM,
-indexedrdd
On Tue, Feb 10, 2015 at 2:27 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi Michael,
I want to cache a RDD and define get() and set() operators on it.
Basically like memcached. Is it possible to build a memcached like
distributed cache using Spark SQL ? If not what do you
PM, Debasish Das debasish.da...@gmail.com
wrote:
Thanks...this is what I was looking for...
It will be great if Ankur can give brief details about it...Basically how
does it contrast with memcached for example...
On Tue, Feb 10, 2015 at 2:32 PM, Michael Armbrust mich...@databricks.com
wrote
[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302932#comment-14302932
]
Debasish Das commented on SPARK-2426:
-
[~mengxr] [~coderxiang] David is out in Feb
Congratulations !
Keep helping the community :-)
On Tue, Feb 3, 2015 at 5:34 PM, Denny Lee denny.g@gmail.com wrote:
Awesome stuff - congratulations! :)
On Tue Feb 03 2015 at 5:34:06 PM Chao Chen crazy...@gmail.com wrote:
Congratulations guys, well done!
在 15-2-4 上午9:26, Nan Zhu
Hi Dib,
For our usecase I want my spark job1 to read from hdfs/cache and write to
kafka queues. Similarly spark job2 should read from kafka queues and write
to kafka queues.
Is writing to kafka queues from spark job supported in your code ?
Thanks
Deb
On Jan 15, 2015 11:21 PM, Akhil Das
For CDH this works well for me...tested till 5.1...
./make-distribution -Dhadoop.version=2.3.0-cdh5.1.0 -Phadoop-2.3 -Pyarn
-Phive -DskipTests
To build with hive thriftserver support for spark-sql
On Fri, Dec 12, 2014 at 1:41 PM, Ganelin, Ilya ilya.gane...@capitalone.com
wrote:
Hi all – we’re
protobuf comes from missing -Phadoop2.3
On Fri, Dec 12, 2014 at 2:34 PM, Sean Owen so...@cloudera.com wrote:
What errors do you see? protobuf errors usually mean you didn't build
for the right version of Hadoop, but if you are using -Phadoop-2.3 or
better -Phadoop-2.4 that should be fine.
[
https://issues.apache.org/jira/browse/SPARK-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243026#comment-14243026
]
Debasish Das commented on SPARK-4675:
-
Is there a metric like MAP / AUC kind
[
https://issues.apache.org/jira/browse/SPARK-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243048#comment-14243048
]
Debasish Das commented on SPARK-4823:
-
[~srowen] did you implement map-reduce row
[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243149#comment-14243149
]
Debasish Das commented on SPARK-2426:
-
[~mengxr] as per our discussion
[
https://issues.apache.org/jira/browse/SPARK-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243207#comment-14243207
]
Debasish Das commented on SPARK-4823:
-
Even for matrix factorization userFactors
101 - 200 of 481 matches
Mail list logo