Re: Spark Streaming idempotent writes to HDFS

2015-11-23 Thread Burak Yavuz
Not sure if it would be the most efficient, but maybe you can think of the filesystem as a key value store, and write each batch to a sub-directory, where the directory name is the batch time. If the directory already exists, then you shouldn't write it. Then you may have a following batch job

[phpMyAdmin Git] [phpmyadmin/phpmyadmin] 7d4aa6: Translated using Weblate (Turkish)

2015-11-18 Thread Burak Yavuz
Branch: refs/heads/master Home: https://github.com/phpmyadmin/phpmyadmin Commit: 7d4aa69470082c559894fe89f47d594071f30f77 https://github.com/phpmyadmin/phpmyadmin/commit/7d4aa69470082c559894fe89f47d594071f30f77 Author: Burak Yavuz <hitowerdi...@hotmail.com> Date: 2015

[phpMyAdmin Git] [phpmyadmin/phpmyadmin] a039b6: Translated using Weblate (Turkish)

2015-11-17 Thread Burak Yavuz
Branch: refs/heads/master Home: https://github.com/phpmyadmin/phpmyadmin Commit: a039b6232ae3f45e0115411587f088b419cb3b63 https://github.com/phpmyadmin/phpmyadmin/commit/a039b6232ae3f45e0115411587f088b419cb3b63 Author: Burak Yavuz <hitowerdi...@hotmail.com> Date: 2015

[phpMyAdmin Git] [phpmyadmin/phpmyadmin] c7378b: Translated using Weblate (Turkish)

2015-11-14 Thread Burak Yavuz
Branch: refs/heads/master Home: https://github.com/phpmyadmin/phpmyadmin Commit: c7378b15cff94f1780969ce4c181d7a5a1c00f2c https://github.com/phpmyadmin/phpmyadmin/commit/c7378b15cff94f1780969ce4c181d7a5a1c00f2c Author: Burak Yavuz <hitowerdi...@hotmail.com> Date: 2015

[phpMyAdmin Git] [phpmyadmin/localized_docs] abb554: Translated using Weblate (Turkish)

2015-11-13 Thread Burak Yavuz
Branch: refs/heads/master Home: https://github.com/phpmyadmin/localized_docs Commit: abb554fa28ccaee33437b0b2d2b1da52a81b912e https://github.com/phpmyadmin/localized_docs/commit/abb554fa28ccaee33437b0b2d2b1da52a81b912e Author: Burak Yavuz <hitowerdi...@hotmail.com> Date:

[jira] [Created] (SPARK-11731) Enable batching on Driver WriteAheadLog by default

2015-11-13 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-11731: --- Summary: Enable batching on Driver WriteAheadLog by default Key: SPARK-11731 URL: https://issues.apache.org/jira/browse/SPARK-11731 Project: Spark Issue Type

Re: large, dense matrix multiplication

2015-11-13 Thread Burak Yavuz
Hi, The BlockMatrix multiplication should be much more efficient on the current master (and will be available with Spark 1.6). Could you please give that a try if you have the chance? Thanks, Burak On Fri, Nov 13, 2015 at 10:11 AM, Sabarish Sasidharan < sabarish.sasidha...@manthan.com> wrote:

[phpMyAdmin Git] [phpmyadmin/localized_docs] d18291: Translated using Weblate (Albanian)

2015-11-12 Thread Burak Yavuz
15) Changed paths: M po/es.mo M po/es.po Log Message: --- Translated using Weblate (Spanish) Currently translated at 87.2% (1667 of 1911 strings) [CI skip] Commit: 72697ec297bb8c4f31ac3db3df37117aa2feaaeb https://github.com/phpmyadmin/localized_docs/commit/72697ec

Re: Spark Packages Configuration Not Found

2015-11-11 Thread Burak Yavuz
Hi Jakob, > As another, general question, are spark packages the go-to way of extending spark functionality? Definitely. There are ~150 Spark Packages out there in spark-packages.org. I use a lot of them in every day Spark work. The number of released packages have steadily increased rate over

[jira] [Created] (SPARK-11639) Flaky test: BatchedWriteAheadLog - name log with aggregated entries with the timestamp of last entry

2015-11-10 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-11639: --- Summary: Flaky test: BatchedWriteAheadLog - name log with aggregated entries with the timestamp of last entry Key: SPARK-11639 URL: https://issues.apache.org/jira/browse/SPARK

[jira] [Commented] (SPARK-11198) Support record de-aggregation in KinesisReceiver

2015-11-02 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985701#comment-14985701 ] Burak Yavuz commented on SPARK-11198: - Just tested this. It works during regular operation

[jira] [Created] (SPARK-11419) WriteAheadLog recovery improvements for when closeFileAfterWrite is enabled

2015-10-30 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-11419: --- Summary: WriteAheadLog recovery improvements for when closeFileAfterWrite is enabled Key: SPARK-11419 URL: https://issues.apache.org/jira/browse/SPARK-11419 Project

[jira] [Commented] (SPARK-11198) Support record de-aggregation in KinesisReceiver

2015-10-30 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982926#comment-14982926 ] Burak Yavuz commented on SPARK-11198: - [~boneill42], did you need to do anything special for de

[jira] [Created] (SPARK-11324) Flag to close Write Ahead Log after writing

2015-10-26 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-11324: --- Summary: Flag to close Write Ahead Log after writing Key: SPARK-11324 URL: https://issues.apache.org/jira/browse/SPARK-11324 Project: Spark Issue Type

[phpMyAdmin Git] [phpmyadmin/phpmyadmin] 6b5b3b: Translated using Weblate (Turkish)

2015-10-23 Thread Burak Yavuz
Branch: refs/heads/QA_4_5 Home: https://github.com/phpmyadmin/phpmyadmin Commit: 6b5b3b395bb73ff3e6bb1948b1c69d379b78ae7d https://github.com/phpmyadmin/phpmyadmin/commit/6b5b3b395bb73ff3e6bb1948b1c69d379b78ae7d Author: Burak Yavuz <hitowerdi...@hotmail.com> Date: 2015

[phpMyAdmin Git] [phpmyadmin/phpmyadmin] 2a1db0: Translated using Weblate (Turkish)

2015-10-23 Thread Burak Yavuz
Branch: refs/heads/master Home: https://github.com/phpmyadmin/phpmyadmin Commit: 2a1db0a27fcfdfe90e496a8a316642c404f620bf https://github.com/phpmyadmin/phpmyadmin/commit/2a1db0a27fcfdfe90e496a8a316642c404f620bf Author: Burak Yavuz <hitowerdi...@hotmail.com> Date: 2015

[phpMyAdmin Git] [phpmyadmin/phpmyadmin] a2f0f1: Translated using Weblate (Turkish)

2015-10-16 Thread Burak Yavuz
Branch: refs/heads/master Home: https://github.com/phpmyadmin/phpmyadmin Commit: a2f0f17cab64d953172fed279cf76a104f5f99bd https://github.com/phpmyadmin/phpmyadmin/commit/a2f0f17cab64d953172fed279cf76a104f5f99bd Author: Burak Yavuz <hitowerdi...@hotmail.com> Date: 2015

[jira] [Created] (SPARK-11141) Batching of ReceivedBlockTrackerLogEvents for efficient WAL writes

2015-10-15 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-11141: --- Summary: Batching of ReceivedBlockTrackerLogEvents for efficient WAL writes Key: SPARK-11141 URL: https://issues.apache.org/jira/browse/SPARK-11141 Project: Spark

[phpMyAdmin Git] [phpmyadmin/phpmyadmin] 5a4022: Translated using Weblate (Finnish)

2015-10-14 Thread Burak Yavuz
.po Log Message: --- Translated using Weblate (Italian) Currently translated at 100.0% (3211 of 3211 strings) [CI skip] Commit: 7aa6cf23ea5035be48a1b8c03548882cd363afcb https://github.com/phpmyadmin/phpmyadmin/commit/7aa6cf23ea5035be48a1b8c03548882cd363afcb Author: Burak Yavuz <hitowerdi..

Re: spark-submit --packages using different resolver

2015-10-03 Thread Burak Yavuz
Hi Jerry, The --packages feature doesn't support private repositories right now. However, in the case of s3, maybe it might work. Could you please try using the --repositories flag and provide the address: `$ spark-submit --packages my:awesome:package --repositories

[phpMyAdmin Git] [phpmyadmin/localized_docs] bb447e: Translated using Weblate (Armenian)

2015-09-30 Thread Burak Yavuz
2b17630752 https://github.com/phpmyadmin/localized_docs/commit/e9ee84ea686d81b08bac7cb1b4e0622b17630752 Author: Burak Yavuz <hitowerdi...@hotmail.com> Date: 2015-09-30 (Wed, 30 Sep 2015) Changed paths: M po/tr.mo M po/tr.po Log Message: --- Translated usi

[jira] [Created] (SPARK-10891) Add MessageHandler to KinesisUtils.createStream similar to Direct Kafka

2015-09-30 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-10891: --- Summary: Add MessageHandler to KinesisUtils.createStream similar to Direct Kafka Key: SPARK-10891 URL: https://issues.apache.org/jira/browse/SPARK-10891 Project: Spark

[jira] [Commented] (SPARK-10889) Upgrade Kinesis Client Library

2015-09-30 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939023#comment-14939023 ] Burak Yavuz commented on SPARK-10889: - In addition, KCL 1.4.0 supports de-aggregation of records

[jira] [Updated] (SPARK-10599) Decrease communication in BlockMatrix multiply and increase performance

2015-09-14 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz updated SPARK-10599: Description: The BlockMatrix multiply sends each block to all the corresponding columns

[jira] [Created] (SPARK-10599) Decrease communication in BlockMatrix multiply and increase performance

2015-09-14 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-10599: --- Summary: Decrease communication in BlockMatrix multiply and increase performance Key: SPARK-10599 URL: https://issues.apache.org/jira/browse/SPARK-10599 Project: Spark

Re: Adding/subtracting org.apache.spark.mllib.linalg.Vector in Scala?

2015-09-09 Thread Burak Yavuz
ES PLEASE! > > :))) > > On Tue, Aug 25, 2015 at 1:57 PM, Burak Yavuz <brk...@gmail.com> wrote: > >> Hmm. I have a lot of code on the local linear algebra operations using >> Spark's Matrix and Vector representations >> done for https://issues.apache.org/jira/bro

Re: [VOTE] Release Apache Spark 1.5.0 (RC3)

2015-09-03 Thread Burak Yavuz
+1. Tested complex R package support (Scala + R code), BLAS and DataFrame fixes good. Burak On Thu, Sep 3, 2015 at 8:56 AM, mkhaitman wrote: > Built and tested on CentOS 7, Hadoop 2.7.1 (Built for 2.6 profile), > Standalone without any problems. Re-tested dynamic

[phpMyAdmin Git] [phpmyadmin/phpmyadmin] 02f708: Translated using Weblate (Albanian)

2015-09-01 Thread Burak Yavuz
.po Log Message: --- Translated using Weblate (Italian) Currently translated at 100.0% (3209 of 3209 strings) [CI skip] Commit: ed21004fdc9a62ca57955780c00fefc198768cde https://github.com/phpmyadmin/phpmyadmin/commit/ed21004fdc9a62ca57955780c00fefc198768cde Author: Burak Yavuz <hitowerdi..

[phpMyAdmin Git] [phpmyadmin/phpmyadmin] 9656d6: Translated using Weblate (Turkish)

2015-08-31 Thread Burak Yavuz
Branch: refs/heads/QA_4_5 Home: https://github.com/phpmyadmin/phpmyadmin Commit: 9656d600b992c275a7179414a70990e63a828823 https://github.com/phpmyadmin/phpmyadmin/commit/9656d600b992c275a7179414a70990e63a828823 Author: Burak Yavuz <hitowerdi...@hotmail.com> Date: 2015

[jira] [Updated] (SPARK-10353) MLlib BLAS gemm outputs wrong result when beta = 0.0 for transpose transpose matrix multiplication

2015-08-29 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz updated SPARK-10353: Affects Version/s: 1.5.0 MLlib BLAS gemm outputs wrong result when beta = 0.0 for transpose

[jira] [Created] (SPARK-10353) MLlib BLAS gemm outputs wrong result when beta = 0.0 for transpose transpose matrix multiplication

2015-08-29 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-10353: --- Summary: MLlib BLAS gemm outputs wrong result when beta = 0.0 for transpose transpose matrix multiplication Key: SPARK-10353 URL: https://issues.apache.org/jira/browse/SPARK-10353

Re: Calculating Min and Max Values using Spark Transformations?

2015-08-28 Thread Burak Yavuz
Or you can just call describe() on the dataframe? In addition to min-max, you'll also get the mean, and count of non-null and non-NA elements as well. Burak On Fri, Aug 28, 2015 at 10:09 AM, java8964 java8...@hotmail.com wrote: Or RDD.max() and RDD.min() won't work for you? Yong

[phpMyAdmin Git] [phpmyadmin/localized_docs] ee9be6: Translated using Weblate (Turkish)

2015-08-27 Thread Burak Yavuz
Branch: refs/heads/master Home: https://github.com/phpmyadmin/localized_docs Commit: ee9be6b3473bb3b71f639bfb36be1dba2e55c0a2 https://github.com/phpmyadmin/localized_docs/commit/ee9be6b3473bb3b71f639bfb36be1dba2e55c0a2 Author: Burak Yavuz hitowerdi...@hotmail.com Date: 2015-08

[phpMyAdmin Git] [phpmyadmin/localized_docs] e6440d: Translated using Weblate (Turkish)

2015-08-26 Thread Burak Yavuz
Branch: refs/heads/master Home: https://github.com/phpmyadmin/localized_docs Commit: e6440d74dd9e8d50d45de6055a984de582d29684 https://github.com/phpmyadmin/localized_docs/commit/e6440d74dd9e8d50d45de6055a984de582d29684 Author: Burak Yavuz hitowerdi...@hotmail.com Date: 2015-08

Re: Adding/subtracting org.apache.spark.mllib.linalg.Vector in Scala?

2015-08-25 Thread Burak Yavuz
Hmm. I have a lot of code on the local linear algebra operations using Spark's Matrix and Vector representations done for https://issues.apache.org/jira/browse/SPARK-6442. I can make a Spark package with that code if people are interested. Best, Burak On Tue, Aug 25, 2015 at 10:54 AM, Kristina

Re: Unable to catch SparkContext methods exceptions

2015-08-24 Thread Burak Yavuz
textFile is a lazy operation. It doesn't evaluate until you call an action on it, such as .count(). Therefore, you won't catch the exception there. Best, Burak On Mon, Aug 24, 2015 at 9:09 AM, Roberto Coluccio roberto.coluc...@gmail.com wrote: Hello folks, I'm experiencing an unexpected

Re: Unable to catch SparkContext methods exceptions

2015-08-24 Thread Burak Yavuz
to evaluate the actual result), and there I can observe and catch the exception. Even considering Spark's laziness, shouldn't I catch the exception while occurring in the try..catch statement that encloses the textFile invocation? Best, Roberto On Mon, Aug 24, 2015 at 7:38 PM, Burak Yavuz brk

[phpMyAdmin Git] [phpmyadmin/phpmyadmin] 70b5c9: Translated using Weblate (Turkish)

2015-08-22 Thread Burak Yavuz
Branch: refs/heads/master Home: https://github.com/phpmyadmin/phpmyadmin Commit: 70b5c9ddeb96b00090e24de363464b3058dffa41 https://github.com/phpmyadmin/phpmyadmin/commit/70b5c9ddeb96b00090e24de363464b3058dffa41 Author: Burak Yavuz hitowerdi...@hotmail.com Date: 2015-08-22 (Sat

Re: Convert mllib.linalg.Matrix to Breeze

2015-08-21 Thread Burak Yavuz
:50 PM, Burak Yavuz wrote: Matrix.toBreeze is a private method. MLlib matrices have the same structure as Breeze Matrices. Just create a new Breeze matrix like this https://github.com/apache/spark/blob/43e0135421b2262cbb0e06aae53523f663b4f959/mllib/src/main/scala/org/apache/spark/mllib/linalg

Re: Convert mllib.linalg.Matrix to Breeze

2015-08-20 Thread Burak Yavuz
Matrix.toBreeze is a private method. MLlib matrices have the same structure as Breeze Matrices. Just create a new Breeze matrix like this https://github.com/apache/spark/blob/43e0135421b2262cbb0e06aae53523f663b4f959/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala#L270 . Best,

Re: Creating Spark DataFrame from large pandas DataFrame

2015-08-20 Thread Burak Yavuz
If you would like to try using spark-csv, please use `pyspark --packages com.databricks:spark-csv_2.11:1.2.0` You're missing a dependency. Best, Burak On Thu, Aug 20, 2015 at 1:08 PM, Charlie Hack charles.t.h...@gmail.com wrote: Hi, I'm new to spark and am trying to create a Spark df from a

[phpMyAdmin Git] [phpmyadmin/phpmyadmin] eb4b3a: Translated using Weblate (Turkish)

2015-08-15 Thread Burak Yavuz
Branch: refs/heads/master Home: https://github.com/phpmyadmin/phpmyadmin Commit: eb4b3a9ce6d83d3a990b0a8cf0def9d92de0af1e https://github.com/phpmyadmin/phpmyadmin/commit/eb4b3a9ce6d83d3a990b0a8cf0def9d92de0af1e Author: Burak Yavuz hitowerdi...@hotmail.com Date: 2015-08-15 (Sat

Re: Unit Testing

2015-08-13 Thread Burak Yavuz
I would recommend this spark package for your unit testing needs ( http://spark-packages.org/package/holdenk/spark-testing-base). Best, Burak On Thu, Aug 13, 2015 at 5:51 AM, jay vyas jayunit100.apa...@gmail.com wrote: yes there certainly is, so long as eclipse has the right plugins and so on

[jira] [Created] (SPARK-9916) Clear leftover sparkr.zip copies and creations (e.g. make-distribution.sh)

2015-08-12 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-9916: -- Summary: Clear leftover sparkr.zip copies and creations (e.g. make-distribution.sh) Key: SPARK-9916 URL: https://issues.apache.org/jira/browse/SPARK-9916 Project: Spark

[phpMyAdmin Git] [phpmyadmin/phpmyadmin] 88010e: Translated using Weblate (Interlingua)

2015-08-09 Thread Burak Yavuz
: cd80cb61a09618d14a60f6d2f719494924624190 https://github.com/phpmyadmin/phpmyadmin/commit/cd80cb61a09618d14a60f6d2f719494924624190 Author: Burak Yavuz hitowerdi...@hotmail.com Date: 2015-08-09 (Sun, 09 Aug 2015) Changed paths: M po/tr.po Log Message: --- Translated using Weblate

[jira] [Commented] (SPARK-9742) NullPointerException when using --packages

2015-08-07 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662260#comment-14662260 ] Burak Yavuz commented on SPARK-9742: Did the behavior of Option's change for some

[jira] [Commented] (SPARK-9614) InternalRow representation during executionPlan.toRdd.aggregete possibly problematic

2015-08-06 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661309#comment-14661309 ] Burak Yavuz commented on SPARK-9614: It used to work in Spark 1.4, without Tungsten. I

[jira] [Created] (SPARK-9615) Use rdd.aggregate in FrequentItems

2015-08-04 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-9615: -- Summary: Use rdd.aggregate in FrequentItems Key: SPARK-9615 URL: https://issues.apache.org/jira/browse/SPARK-9615 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-9614) InternalRow representation during executionPlan.toRdd.aggregete possibly problematic

2015-08-04 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-9614: -- Summary: InternalRow representation during executionPlan.toRdd.aggregete possibly problematic Key: SPARK-9614 URL: https://issues.apache.org/jira/browse/SPARK-9614

[jira] [Commented] (SPARK-9614) InternalRow representation during executionPlan.toRdd.aggregete possibly problematic

2015-08-04 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654402#comment-14654402 ] Burak Yavuz commented on SPARK-9614: cc [~joshrosen] InternalRow representation

[jira] [Created] (SPARK-9616) Erroneous result in Frequent Items (SQL) when merging FrequentItemCounters

2015-08-04 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-9616: -- Summary: Erroneous result in Frequent Items (SQL) when merging FrequentItemCounters Key: SPARK-9616 URL: https://issues.apache.org/jira/browse/SPARK-9616 Project: Spark

[jira] [Created] (SPARK-9603) Re-enable complex R package test in SparkSubmitSuite

2015-08-04 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-9603: -- Summary: Re-enable complex R package test in SparkSubmitSuite Key: SPARK-9603 URL: https://issues.apache.org/jira/browse/SPARK-9603 Project: Spark Issue Type

Re: Cannot Import Package (spark-csv)

2015-08-03 Thread Burak Yavuz
Hi, there was this issue for Scala 2.11. https://issues.apache.org/jira/browse/SPARK-7944 It should be fixed on master branch. You may be hitting that. Best, Burak On Sun, Aug 2, 2015 at 9:06 PM, Ted Yu yuzhih...@gmail.com wrote: I tried the following command on master branch: bin/spark-shell

Re: Cannot Import Package (spark-csv)

2015-08-03 Thread Burak Yavuz
In addition, you do not need to use --jars with --packages. --packages will get the jar for you. Best, Burak On Mon, Aug 3, 2015 at 9:01 AM, Burak Yavuz brk...@gmail.com wrote: Hi, there was this issue for Scala 2.11. https://issues.apache.org/jira/browse/SPARK-7944 It should be fixed

Re: FrequentItems in spark-sql-execution-stat

2015-08-01 Thread Burak Yavuz
Hi Yucheng, Thanks for pointing out the issue. You are correct, in the case that the final map is completely empty after the merge, we do need to add the final element to the map, with the correct count (decrement the count with the max count that was already in the map). I'll submit a fix for

Re: Which directory contains third party libraries for Spark

2015-07-28 Thread Burak Yavuz
Hey Stephen, In case these libraries exist on the client as a form of maven library, you can use --packages to ship the library and all it's dependencies, without building an uber jar. Best, Burak On Tue, Jul 28, 2015 at 10:23 AM, Marcelo Vanzin van...@cloudera.com wrote: Hi Stephen, There

[Phpmyadmin-git] [phpmyadmin/phpmyadmin] cb77ab: Translated using Weblate (Turkish)

2015-07-23 Thread Burak Yavuz
Branch: refs/heads/master Home: https://github.com/phpmyadmin/phpmyadmin Commit: cb77ab1e50fb5e3daa92faf96e34b26a1d2d109b https://github.com/phpmyadmin/phpmyadmin/commit/cb77ab1e50fb5e3daa92faf96e34b26a1d2d109b Author: Burak Yavuz hitowerdi...@hotmail.com Date: 2015-07-23 (Thu

Re: How to unpersist RDDs generated by ALS/MatrixFactorizationModel

2015-07-22 Thread Burak Yavuz
Hi Jonathan, I believe calling persist with StorageLevel.NONE doesn't do anything. That's why the unpersist has an if statement before it. Could you give more information about your setup please? Number of cores, memory, number of partitions of ratings_train? Thanks, Burak On Wed, Jul 22, 2015

[jira] [Created] (SPARK-9263) Add Spark Submit flag to exclude dependencies when using --packages

2015-07-22 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-9263: -- Summary: Add Spark Submit flag to exclude dependencies when using --packages Key: SPARK-9263 URL: https://issues.apache.org/jira/browse/SPARK-9263 Project: Spark

Re: LinearRegressionWithSGD Outputs NaN

2015-07-21 Thread Burak Yavuz
Hi, Could you please decrease your step size to 0.1, and also try 0.01? You could also try running L-BFGS, which doesn't have step size tuning, to get better results. Best, Burak On Tue, Jul 21, 2015 at 2:59 AM, Naveen nav...@formcept.com wrote: Hi , I am trying to use

Re: RowId unique key for Dataframes

2015-07-21 Thread Burak Yavuz
Would monotonicallyIncreasingId https://github.com/apache/spark/blob/d4c7a7a3642a74ad40093c96c4bf45a62a470605/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L637 work for you? Best, Burak On Tue, Jul 21, 2015 at 4:55 PM, Srikanth srikanth...@gmail.com wrote: Hello, I'm

Re: BlockMatrix multiplication

2015-07-17 Thread Burak Yavuz
shuffling given the blocks co-location? Best regards, Alexander *From:* Burak Yavuz [mailto:brk...@gmail.com] *Sent:* Wednesday, July 15, 2015 3:29 PM *To:* Ulanov, Alexander *Cc:* Rakesh Chalasani; dev@spark.apache.org *Subject:* Re: BlockMatrix multiplication Hi Alexander, I just

Re: Strange behavoir of pyspark with --jars option

2015-07-15 Thread Burak Yavuz
Hi, I believe the HiveContext uses a different class loader. It then falls back to the system class loader if it can't find the classes in the context class loader. The system class loader contains the classpath passed through --driver-class-path and spark.executor.extraClassPath. The JVM is

Re: MLlib LogisticRegressionWithLBFGS error

2015-07-15 Thread Burak Yavuz
Hi, Is this in LibSVM format? If so, the indices should be sorted in increasing order. It seems like they are not sorted. Best, Burak On Tue, Jul 14, 2015 at 7:31 PM, Vi Ngo Van ngovi.se@gmail.com wrote: Hi All, I've met a issue with MLlib when i use LogisticRegressionWithLBFGS my

Re: creating a distributed index

2015-07-15 Thread Burak Yavuz
Hi Swetha, IndexedRDD is available as a package on Spark Packages http://spark-packages.org/package/amplab/spark-indexedrdd. Best, Burak On Tue, Jul 14, 2015 at 5:23 PM, swetha swethakasire...@gmail.com wrote: Hi Ankur, Is IndexedRDD available in Spark 1.4.0? We would like to use this in

Re: BlockMatrix multiplication

2015-07-15 Thread Burak Yavuz
() - t) / 1e9) Best regards, Alexander *From:* Ulanov, Alexander *Sent:* Tuesday, July 14, 2015 6:24 PM *To:* 'Burak Yavuz' *Cc:* Rakesh Chalasani; dev@spark.apache.org *Subject:* RE: BlockMatrix multiplication Hi Burak, Thank you for explanation! I will try to make a diagonal

Re: Running mllib from R in Spark 1.4

2015-07-15 Thread Burak Yavuz
Hi, There is no MLlib support in SparkR in 1.4. There will be some support in 1.5. You can check these JIRAs for progress: https://issues.apache.org/jira/browse/SPARK-6805 https://issues.apache.org/jira/browse/SPARK-6823 Best, Burak On Wed, Jul 15, 2015 at 6:00 AM, madhu phatak

Re: BlockMatrix multiplication

2015-07-14 Thread Burak Yavuz
Hi Alexander, From your example code, using the GridPartitioner, you will have 1 column, and 5 rows. When you perform an A^T^A multiplication, you will generate a separate GridPartitioner with 5 columns and 5 rows. Therefore you are observing a huge shuffle. If you would generate a diagonal-block

Re: To access elements of a org.apache.spark.mllib.linalg.Vector

2015-07-14 Thread Burak Yavuz
Hi Dan, You could zip the indices with the values if you like. ``` val sVec = sparseVector(1).asInstanceOf[ org.apache.spark.mllib.linalg.SparseVector] val map = sVec.indices.zip(sVec.values).toMap ``` Best, Burak On Tue, Jul 14, 2015 at 12:23 PM, Dan Dong dongda...@gmail.com wrote: Hi,

Re: [MLLib][Kmeans] KMeansModel.computeCost takes lot of time

2015-07-13 Thread Burak Yavuz
On Mon, Jul 13, 2015 at 10:28 PM, Burak Yavuz brk...@gmail.com wrote: Hi, How are you running K-Means? What is your k? What is the dimension of your dataset (columns)? Which Spark version are you using? Thanks, Burak On Mon, Jul 13, 2015 at 2:53 AM, Nirmal Fernando nir...@wso2.com wrote

Re: [MLLib][Kmeans] KMeansModel.computeCost takes lot of time

2015-07-13 Thread Burak Yavuz
, Nirmal Fernando nir...@wso2.com wrote: I'm using; org.apache.spark.mllib.clustering.KMeans.train(data.rdd(), 3, 20); Cpu cores: 8 (using default Spark conf thought) On partitions, I'm not sure how to find that. On Mon, Jul 13, 2015 at 11:30 PM, Burak Yavuz brk...@gmail.com wrote: What

Re: [MLLib][Kmeans] KMeansModel.computeCost takes lot of time

2015-07-13 Thread Burak Yavuz
Hi, How are you running K-Means? What is your k? What is the dimension of your dataset (columns)? Which Spark version are you using? Thanks, Burak On Mon, Jul 13, 2015 at 2:53 AM, Nirmal Fernando nir...@wso2.com wrote: Hi, For a fairly large dataset, 30MB, KMeansModel.computeCost takes lot

[Phpmyadmin-git] [phpmyadmin/localized_docs] e58f00: Translated using Weblate (Turkish)

2015-07-10 Thread Burak Yavuz
Branch: refs/heads/master Home: https://github.com/phpmyadmin/localized_docs Commit: e58f0054418b0442a88effe9437e503d7efc339f https://github.com/phpmyadmin/localized_docs/commit/e58f0054418b0442a88effe9437e503d7efc339f Author: Burak Yavuz hitowerdi...@hotmail.com Date: 2015-07

Re: Unit tests of spark application

2015-07-10 Thread Burak Yavuz
I can +1 Holden's spark-testing-base package. Burak On Fri, Jul 10, 2015 at 12:23 PM, Holden Karau hol...@pigscanfly.ca wrote: Somewhat biased of course, but you can also use spark-testing-base from spark-packages.org as a basis for your unittests. On Fri, Jul 10, 2015 at 12:03 PM, Daniel

[Phpmyadmin-git] [phpmyadmin/phpmyadmin] 92a63d: Translated using Weblate (Turkish)

2015-07-10 Thread Burak Yavuz
Branch: refs/heads/master Home: https://github.com/phpmyadmin/phpmyadmin Commit: 92a63d2cc3cff46b9a5ce087162557c2bb5c729e https://github.com/phpmyadmin/phpmyadmin/commit/92a63d2cc3cff46b9a5ce087162557c2bb5c729e Author: Burak Yavuz hitowerdi...@hotmail.com Date: 2015-07-10 (Fri

Re: How to ignore features in mllib

2015-07-09 Thread Burak Yavuz
If you use the Pipelines Api with DataFrames, you select which columns you would like to train on using the VectorAssembler. While using the VectorAssembler, you can choose not to select some features if you like. Best, Burak On Thu, Jul 9, 2015 at 10:38 AM, Arun Luthra arun.lut...@gmail.com

Re: [VOTE] Release Apache Spark 1.4.1 (RC4)

2015-07-09 Thread Burak Yavuz
+1 nonbinding. On Thu, Jul 9, 2015 at 7:38 AM, Sean Owen so...@cloudera.com wrote: +1 nonbinding. All previous RC issues appear resolved. All tests pass with the -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver invocation. Signatures et al are OK. On Thu, Jul 9, 2015 at 6:55 AM, Patrick

[Phpmyadmin-git] [phpmyadmin/phpmyadmin] 3a3c49: Translated using Weblate (Turkish)

2015-07-08 Thread Burak Yavuz
Branch: refs/heads/master Home: https://github.com/phpmyadmin/phpmyadmin Commit: 3a3c4969ada9bb641abc492e497710287ce3241c https://github.com/phpmyadmin/phpmyadmin/commit/3a3c4969ada9bb641abc492e497710287ce3241c Author: Burak Yavuz hitowerdi...@hotmail.com Date: 2015-07-08 (Wed

[Phpmyadmin-git] [phpmyadmin/phpmyadmin] f4f94f: Translated using Weblate (Turkish)

2015-07-08 Thread Burak Yavuz
Branch: refs/heads/QA_4_4 Home: https://github.com/phpmyadmin/phpmyadmin Commit: f4f94fc4fcac477b229feb3eaf30e0aa74d03fb7 https://github.com/phpmyadmin/phpmyadmin/commit/f4f94fc4fcac477b229feb3eaf30e0aa74d03fb7 Author: Burak Yavuz hitowerdi...@hotmail.com Date: 2015-07-08 (Wed

Re: spark-submit can not resolve spark-hive_2.10

2015-07-07 Thread Burak Yavuz
spark-hive is excluded when using --packages, because it can be included in the spark-assembly by adding -Phive during mvn package or sbt assembly. Best, Burak On Tue, Jul 7, 2015 at 8:06 AM, Hao Ren inv...@gmail.com wrote: I want to add spark-hive as a dependence to submit my job, but it

[jira] [Updated] (SPARK-6442) MLlib 1.4 Local Linear Algebra Package

2015-07-04 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz updated SPARK-6442: --- Description: MLlib's local linear algebra package doesn't have any support for any type of matrix

Re: Spark 1.4 MLLib Bug?: Multiclass Classification requirement failed: sizeInBytes was negative

2015-07-03 Thread Burak Yavuz
How many partitions do you have? It might be that one partition is too large, and there is Integer overflow. Could you double your number of partitions? Burak On Fri, Jul 3, 2015 at 4:41 AM, Danny kont...@dannylinden.de wrote: hi, i want to run a multiclass classification with 390 classes

[jira] [Created] (SPARK-8803) Crosstab element's can't contain null's and back ticks

2015-07-02 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-8803: -- Summary: Crosstab element's can't contain null's and back ticks Key: SPARK-8803 URL: https://issues.apache.org/jira/browse/SPARK-8803 Project: Spark Issue Type

Re: coalesce on dataFrame

2015-07-01 Thread Burak Yavuz
You can use df.repartition(1) in Spark 1.4. See here https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L1396 . Best, Burak On Wed, Jul 1, 2015 at 3:05 AM, Olivier Girardot ssab...@gmail.com wrote: PySpark or Spark (scala) ? When you use

Re: Can Dependencies Be Resolved on Spark Cluster?

2015-06-30 Thread Burak Yavuz
, Jun 29, 2015 at 11:33 PM, SLiZn Liu sliznmail...@gmail.com wrote: Hi Burak, Is `--package` flag only available for maven, no sbt support? On Tue, Jun 30, 2015 at 2:26 PM Burak Yavuz brk...@gmail.com wrote: You can pass `--packages your:comma-separated:maven-dependencies` to spark submit

Re: Can Dependencies Be Resolved on Spark Cluster?

2015-06-30 Thread Burak Yavuz
You can pass `--packages your:comma-separated:maven-dependencies` to spark submit if you have Spark 1.3 or greater. Best regards, Burak On Mon, Jun 29, 2015 at 10:46 PM, SLiZn Liu sliznmail...@gmail.com wrote: Hey Spark Users, I'm writing a demo with Spark and HBase. What I've done is

[jira] [Commented] (SPARK-8599) Use a Random operator to handle Random distribution generating expressions

2015-06-29 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605777#comment-14605777 ] Burak Yavuz commented on SPARK-8599: It would be great if it works for this case

[jira] [Commented] (SPARK-8410) Hive VersionsSuite RuntimeException

2015-06-29 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605977#comment-14605977 ] Burak Yavuz commented on SPARK-8410: Hi Joe, Is it possible to delete those files

[jira] [Commented] (SPARK-8475) SparkSubmit with Ivy jars is very slow to load with no internet access

2015-06-29 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605968#comment-14605968 ] Burak Yavuz commented on SPARK-8475: ping. I think you can go ahead with a PR

[jira] [Commented] (SPARK-8410) Hive VersionsSuite RuntimeException

2015-06-29 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606080#comment-14606080 ] Burak Yavuz commented on SPARK-8410: Hi Joe, Could you please check whether https

[jira] [Created] (SPARK-8715) ArrayOutOfBoundsException for DataFrameStatSuite.crosstab

2015-06-29 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-8715: -- Summary: ArrayOutOfBoundsException for DataFrameStatSuite.crosstab Key: SPARK-8715 URL: https://issues.apache.org/jira/browse/SPARK-8715 Project: Spark Issue

[Phpmyadmin-git] [phpmyadmin/phpmyadmin] 68116a: Translated using Weblate (Turkish)

2015-06-28 Thread Burak Yavuz
Branch: refs/heads/master Home: https://github.com/phpmyadmin/phpmyadmin Commit: 68116ae860346108aa0850ffcd49c5241759562c https://github.com/phpmyadmin/phpmyadmin/commit/68116ae860346108aa0850ffcd49c5241759562c Author: Burak Yavuz hitowerdi...@hotmail.com Date: 2015-06-29 (Mon

[jira] [Created] (SPARK-8681) crosstab column names in wrong order

2015-06-27 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-8681: -- Summary: crosstab column names in wrong order Key: SPARK-8681 URL: https://issues.apache.org/jira/browse/SPARK-8681 Project: Spark Issue Type: Sub-task

[Phpmyadmin-git] [phpmyadmin/localized_docs] 1e0bb6: Translated using Weblate (Turkish)

2015-06-26 Thread Burak Yavuz
Branch: refs/heads/master Home: https://github.com/phpmyadmin/localized_docs Commit: 1e0bb6be29e91e6e47ad7f95169c4d1c9d92cfe7 https://github.com/phpmyadmin/localized_docs/commit/1e0bb6be29e91e6e47ad7f95169c4d1c9d92cfe7 Author: Burak Yavuz hitowerdi...@hotmail.com Date: 2015-06

[jira] [Created] (SPARK-8608) After initializing a DataFrame with random columns and a seed, df.show should return same value

2015-06-24 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-8608: -- Summary: After initializing a DataFrame with random columns and a seed, df.show should return same value Key: SPARK-8608 URL: https://issues.apache.org/jira/browse/SPARK-8608

[jira] [Created] (SPARK-8609) After initializing a DataFrame with random columns and a seed, ordering by that random column should return same sorted order

2015-06-24 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-8609: -- Summary: After initializing a DataFrame with random columns and a seed, ordering by that random column should return same sorted order Key: SPARK-8609 URL: https://issues.apache.org

[jira] [Commented] (SPARK-8599) Use a Random operator to handle Random distribution generating expressions

2015-06-24 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600315#comment-14600315 ] Burak Yavuz commented on SPARK-8599: cc [~marmbrus] [~rxin] Use a Random operator

Re: Understanding accumulator during transformations

2015-06-24 Thread Burak Yavuz
Hi Wei, For example, when a straggler executor gets killed in the middle of a map operation and it's task is restarted at a different instance, the accumulator will be updated more than once. Best, Burak On Wed, Jun 24, 2015 at 1:08 PM, Wei Zhou zhweisop...@gmail.com wrote: Quoting from Spark

Re: Understanding accumulator during transformations

2015-06-24 Thread Burak Yavuz
the transformation ended up updating accumulator more than once? Best, Wei 2015-06-24 13:23 GMT-07:00 Burak Yavuz brk...@gmail.com: Hi Wei, For example, when a straggler executor gets killed in the middle of a map operation and it's task is restarted at a different instance, the accumulator

Re: [GraphX] Graph 500 graph generator

2015-06-24 Thread Burak Yavuz
Hi Ryan, If you can get past the paperwork, I'm sure this can make a great Spark Package (http://spark-packages.org). People then can use it for benchmarking purposes, and I'm sure people will be looking for graph generators! Best, Burak On Wed, Jun 24, 2015 at 7:55 AM, Carr, J. Ryan

<    2   3   4   5   6   7   8   9   10   11   >