[jira] [Updated] (SPARK-2469) Lower shuffle compression buffer memory usage
[ https://issues.apache.org/jira/browse/SPARK-2469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2469: --- Summary: Lower shuffle compression buffer memory usage (was: Lower shuffle compression memory usage) > Lower shuffle compression buffer memory usage > - > > Key: SPARK-2469 > URL: https://issues.apache.org/jira/browse/SPARK-2469 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Reporter: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2469) Lower shuffle compression memory usage
Reynold Xin created SPARK-2469: -- Summary: Lower shuffle compression memory usage Key: SPARK-2469 URL: https://issues.apache.org/jira/browse/SPARK-2469 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2467) Revert SparkBuild to publish-local to both .m2 and .ivy2.
[ https://issues.apache.org/jira/browse/SPARK-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060381#comment-14060381 ] Takuya Ueshin commented on SPARK-2467: -- PRed: https://github.com/apache/spark/pull/1398 > Revert SparkBuild to publish-local to both .m2 and .ivy2. > - > > Key: SPARK-2467 > URL: https://issues.apache.org/jira/browse/SPARK-2467 > Project: Spark > Issue Type: Bug >Reporter: Takuya Ueshin > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2468) zero-copy shuffle network communication
Reynold Xin created SPARK-2468: -- Summary: zero-copy shuffle network communication Key: SPARK-2468 URL: https://issues.apache.org/jira/browse/SPARK-2468 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Reynold Xin Assignee: Reynold Xin Priority: Critical Right now shuffle send goes through the block manager. This is inefficient because it requires loading a block from disk into a kernel buffer, then into a user space buffer, and then back to a kernel send buffer before it reaches the NIC. It does multiple copies of the data and context switching between kernel/user. It also creates unnecessary buffer in the JVM that increases GC Instead, we should use FileChannel.transferTo, which handles this in the kernel space with zero-copy. See http://www.ibm.com/developerworks/library/j-zerocopy/ One potential solution is to use Netty NIO. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2468) zero-copy shuffle network communication
[ https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2468: --- Component/s: Shuffle > zero-copy shuffle network communication > --- > > Key: SPARK-2468 > URL: https://issues.apache.org/jira/browse/SPARK-2468 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Reporter: Reynold Xin >Assignee: Reynold Xin >Priority: Critical > > Right now shuffle send goes through the block manager. This is inefficient > because it requires loading a block from disk into a kernel buffer, then into > a user space buffer, and then back to a kernel send buffer before it reaches > the NIC. It does multiple copies of the data and context switching between > kernel/user. It also creates unnecessary buffer in the JVM that increases GC > Instead, we should use FileChannel.transferTo, which handles this in the > kernel space with zero-copy. See > http://www.ibm.com/developerworks/library/j-zerocopy/ > One potential solution is to use Netty NIO. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2467) Revert SparkBuild to publish-local to both .m2 and .ivy2.
Takuya Ueshin created SPARK-2467: Summary: Revert SparkBuild to publish-local to both .m2 and .ivy2. Key: SPARK-2467 URL: https://issues.apache.org/jira/browse/SPARK-2467 Project: Spark Issue Type: Bug Reporter: Takuya Ueshin -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2382) build error:
[ https://issues.apache.org/jira/browse/SPARK-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060374#comment-14060374 ] Mukul Jain commented on SPARK-2382: --- BTW, I do believe that it is not a spark issue...but I wanted to see if spark documentation could be improved in someway to help through this issue or even better avoid it altogether. Mukul > build error: > - > > Key: SPARK-2382 > URL: https://issues.apache.org/jira/browse/SPARK-2382 > Project: Spark > Issue Type: Question > Components: Build >Affects Versions: 1.0.0 > Environment: Ubuntu 12.0.4 precise. > spark@ubuntu-cdh5-spark:~/spark-1.0.0$ mvn -version > Apache Maven 3.0.4 > Maven home: /usr/share/maven > Java version: 1.6.0_31, vendor: Sun Microsystems Inc. > Java home: /usr/lib/jvm/j2sdk1.6-oracle/jre > Default locale: en_US, platform encoding: UTF-8 > OS name: "linux", version: "3.11.0-15-generic", arch: "amd64", family: "unix" >Reporter: Mukul Jain > Labels: newbie > > Unable to build. maven can't download dependency .. checked my http_proxy and > https_proxy setting they are working fine. Other http and https dependencies > were downloaded fine.. build process gets stuck always at this repo. manually > down loading also fails and receive an exception. > [INFO] > > [INFO] Building Spark Project External MQTT 1.0.0 > [INFO] > > Downloading: > https://repository.apache.org/content/repositories/releases/org/eclipse/paho/mqtt-client/0.4.0/mqtt-client-0.4.0.pom > Jul 6, 2014 4:53:26 PM org.apache.commons.httpclient.HttpMethodDirector > executeWithRetry > INFO: I/O exception (java.net.ConnectException) caught when processing > request: Connection timed out > Jul 6, 2014 4:53:26 PM org.apache.commons.httpclient.HttpMethodDirector > executeWithRetry > INFO: Retrying request -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2466) Got two different block manager registrations
Alex Gaudio created SPARK-2466: -- Summary: Got two different block manager registrations Key: SPARK-2466 URL: https://issues.apache.org/jira/browse/SPARK-2466 Project: Spark Issue Type: Bug Components: Block Manager, Mesos Affects Versions: 1.0.0 Environment: Mesos 0.19.0 Spark 1.0.0 (Ubuntu 14.04 LTS) Reporter: Alex Gaudio On PySpark and SparkR (haven't tried with Scala Spark) running on our Mesos cluster, we get the following error, which causes spark to fail. ``` ERROR BlockManagerMasterActor: Got two different block manager registrations on 20140627-192758-654448812-5050-31629-42 ``` We believe this is because tasks between two different stages may share the same task id if they run within the same second. As a temporary workaround, we are adding a second of space between executions of lazily evaluated spark code. This appears to solve the problem. We don't see this issue running spark in local mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2465) Use long as user / item ID for ALS
[ https://issues.apache.org/jira/browse/SPARK-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060300#comment-14060300 ] Xiangrui Meng commented on SPARK-2465: -- [~sowen] The ALS implementation shuffles data for each iteration. I tested ALS on 100x Amazon Reviews dataset. Each iteration shuffles about 200GB data (see the screenshot attached). If we switch to Long, ALS will definitely slow down. On the other hand, having a few hash collisions may not be a serious problem. That is essentially random dimensionality reduction and it also densifies the data, which helps ALS. We can estimate how many users/products we can handle if we allow 0.1% collision (should be couple million) and discuss more about the trade-offs. > Use long as user / item ID for ALS > -- > > Key: SPARK-2465 > URL: https://issues.apache.org/jira/browse/SPARK-2465 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.0.1 >Reporter: Sean Owen >Priority: Minor > Attachments: Screen Shot 2014-07-13 at 8.49.40 PM.png > > > I'd like to float this for consideration: use longs instead of ints for user > and product IDs in the ALS implementation. > The main reason for is that identifiers are not generally numeric at all, and > will be hashed to an integer. (This is a separate issue.) Hashing to 32 bits > means collisions are likely after hundreds of thousands of users and items, > which is not unrealistic. Hashing to 64 bits pushes this back to billions. > It would also mean numeric IDs that happen to be larger than the largest int > can be used directly as identifiers. > On the downside of course: 8 bytes instead of 4 bytes of memory used per > Rating. > Thoughts? I will post a PR so as to show what the change would be. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2465) Use long as user / item ID for ALS
[ https://issues.apache.org/jira/browse/SPARK-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-2465: - Attachment: Screen Shot 2014-07-13 at 8.49.40 PM.png > Use long as user / item ID for ALS > -- > > Key: SPARK-2465 > URL: https://issues.apache.org/jira/browse/SPARK-2465 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.0.1 >Reporter: Sean Owen >Priority: Minor > Attachments: Screen Shot 2014-07-13 at 8.49.40 PM.png > > > I'd like to float this for consideration: use longs instead of ints for user > and product IDs in the ALS implementation. > The main reason for is that identifiers are not generally numeric at all, and > will be hashed to an integer. (This is a separate issue.) Hashing to 32 bits > means collisions are likely after hundreds of thousands of users and items, > which is not unrealistic. Hashing to 64 bits pushes this back to billions. > It would also mean numeric IDs that happen to be larger than the largest int > can be used directly as identifiers. > On the downside of course: 8 bytes instead of 4 bytes of memory used per > Rating. > Thoughts? I will post a PR so as to show what the change would be. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-953) Latent Dirichlet Association (LDA model)
[ https://issues.apache.org/jira/browse/SPARK-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060291#comment-14060291 ] Masaki Rikitoku commented on SPARK-953: --- Latent Dirichlet Allocation? > Latent Dirichlet Association (LDA model) > > > Key: SPARK-953 > URL: https://issues.apache.org/jira/browse/SPARK-953 > Project: Spark > Issue Type: Story > Components: Examples >Affects Versions: 0.7.3 >Reporter: caizhua >Priority: Critical > > This code is for learning the LDA model. However, if our input is 2.5 M > documents per machine, a dictionary with 1 words, running in EC2 > m2.4xlarge instance with 68 G memory each machine. The time is really really > slow. For five iterations, the time cost is 8145, 24725, 51688, 58674, 56850 > seconds. The time for shuffling is quite slow. The LDA.tbl is the simulated > data set for the program, and it is quite fast. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2363) Clean MLlib's sample data files
[ https://issues.apache.org/jira/browse/SPARK-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-2363. -- Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 1394 [https://github.com/apache/spark/pull/1394] > Clean MLlib's sample data files > --- > > Key: SPARK-2363 > URL: https://issues.apache.org/jira/browse/SPARK-2363 > Project: Spark > Issue Type: Task > Components: MLlib >Reporter: Xiangrui Meng >Priority: Minor > Fix For: 1.1.0 > > > MLlib has sample data under serveral folders: > 1) data/mllib > 2) data/ > 3) mllib/data/* > Per previous discussion with [~matei], we want to put them under `data/mllib` > and clean outdated files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2363) Clean MLlib's sample data files
[ https://issues.apache.org/jira/browse/SPARK-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-2363: - Assignee: Sean Owen > Clean MLlib's sample data files > --- > > Key: SPARK-2363 > URL: https://issues.apache.org/jira/browse/SPARK-2363 > Project: Spark > Issue Type: Task > Components: MLlib >Reporter: Xiangrui Meng >Assignee: Sean Owen >Priority: Minor > Fix For: 1.1.0 > > > MLlib has sample data under serveral folders: > 1) data/mllib > 2) data/ > 3) mllib/data/* > Per previous discussion with [~matei], we want to put them under `data/mllib` > and clean outdated files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2354) BitSet Range Expanded when creating new one
[ https://issues.apache.org/jira/browse/SPARK-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060279#comment-14060279 ] Yijie Shen commented on SPARK-2354: --- For the methods available in BitSet, they have the same effect. But what if I want to implement a `complement` or `xnor` method? Since the Iterator's hasNext method only check for nextSetBit's index is >= 0, when iterating the complement bitset, indexes out of range will return, between the numBits and capacity. > BitSet Range Expanded when creating new one > --- > > Key: SPARK-2354 > URL: https://issues.apache.org/jira/browse/SPARK-2354 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0, 1.1.0 >Reporter: Yijie Shen >Priority: Minor > > BitSet has a constructor parameter named "numBits: Int" and indicate the bit > num inside. > And also, there is a function called "capacity" which represents the long > words number to hold the bits. > When creating new BitSet,for example in '|', I thought the new created one > shouldn't be the size of longer words' length, instead, it should be the > longer set's num of bit > {code}def |(other: BitSet): BitSet = { > val newBS = new BitSet(math.max(numBits, other.numBits)) > // I know by now the numBits isn't a field > {code} > Does it have any other reason to expand the BitSet range I don't know? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2278) groupBy & groupByKey should support custom comparator
[ https://issues.apache.org/jira/browse/SPARK-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060267#comment-14060267 ] Hans Uhlig commented on SPARK-2278: --- So I just checked with the current 1.0.0 api and JavaPairRDD implements the following. (There was no SortBy that I could find) JavaPairRDD JavaPairRDD.sortByKey() JavaPairRDD JavaPairRDD.sortByKey(Comparator comp) JavaPairRDD JavaPairRDD.sortByKey(boolean ascending) JavaPairRDD JavaPairRDD.sortByKey(Comparator comp, boolean ascending) JavaPairRDD JavaPairRDD.sortByKey(Comparator comp, boolean ascending, int numPartitions) JavaPairRDD> JavaRDD.groupBy( groupBy(Function arg0) ) JavaPairRDD> JavaRDD.groupBy( JavaPairRDD>> groupBy(Function,K> f) ) JavaPairRDD> JavaRDD.groupBy( JavaPairRDD> groupBy(Function arg0, int arg1) ) JavaPairRDD> JavaRDD.groupBy( JavaPairRDD>> groupBy(Function,K> f, int numPartitions) ) JavaPairRDD.groupByKey() JavaPairRDD.groupByKey(Partitioner partitioner ) JavaPairRDD.groupByKey(int numPartitions ) The base non implied parameter functions should provide the following interfaces for optimum control and flexibility: JavaRDD JavaRDD.sortBy(Comparator comp, boolean ascending, Partitioner partitioner, int numPartitions) JavaPairRDD JavaPairRDD.sortByKey(Comparator comp, boolean ascending, Partitioner partitioner, int numPartitions) JavaRDD> JavaRDD.groupBy(JavaPairRDD> groupBy(Function func()), Comparator comp, boolean ascending, Partitioner partitioner, int numPartitions) JavaPairRDD> JavaPairRDD.groupByKey( JavaPairRDD> groupBy(Function func), Comparator comp, boolean ascending, Partitioner partitioner, int numPartitions) GroupByKey's function Reference should look something like "Iterable Function (K key, Iterable values)" Unless there is a different function to do that particular job that I am missing. The lack of descriptions for what the inputs and outputs of the function references should do make that a bit difficult to discern sometimes. > groupBy & groupByKey should support custom comparator > - > > Key: SPARK-2278 > URL: https://issues.apache.org/jira/browse/SPARK-2278 > Project: Spark > Issue Type: New Feature > Components: Java API >Affects Versions: 1.0.0 >Reporter: Hans Uhlig > > To maintain parity with MapReduce you should be able to specify a custom key > equality function in groupBy/groupByKey similar to sortByKey. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1945) Add full Java examples in MLlib docs
[ https://issues.apache.org/jira/browse/SPARK-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060244#comment-14060244 ] Michael Yannakopoulos commented on SPARK-1945: -- I am willing to provide a java example for decision trees as well as to enhance the java example provided in the naive-bayes section. What is more, I would like to ask you if there is an equivalent class for scala/spark RowMatrix in the equivalent python api. This is because I would like to provide examples in the dimensionality reduction section of mllib documentation using python. Thanks, Michael > Add full Java examples in MLlib docs > > > Key: SPARK-1945 > URL: https://issues.apache.org/jira/browse/SPARK-1945 > Project: Spark > Issue Type: Sub-task > Components: Documentation, MLlib >Reporter: Matei Zaharia > Labels: Starter > Fix For: 1.0.0 > > > Right now some of the Java tabs only say the following: > "All of MLlib’s methods use Java-friendly types, so you can import and call > them there the same way you do in Scala. The only caveat is that the methods > take Scala RDD objects, while the Spark Java API uses a separate JavaRDD > class. You can convert a Java RDD to a Scala one by calling .rdd() on your > JavaRDD object." > Would be nice to translate the Scala code into Java instead. > Also, a few pages (most notably the Matrix one) don't have Java examples at > all. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1945) Add full Java examples in MLlib docs
[ https://issues.apache.org/jira/browse/SPARK-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060243#comment-14060243 ] Michael Yannakopoulos commented on SPARK-1945: -- Hello guys, I have provided Java examples for the following documentation files: mllib-clustering.md mllib-collaborative-filtering.md mllib-dimensionality-reduction.md mllib-linear-methods.md mllib-optimization.md My pull request is: [https://github.com/apache/spark/pull/1311] Enjoy and do not hesitate to contact me for any remark/correction. Thanks, Michael > Add full Java examples in MLlib docs > > > Key: SPARK-1945 > URL: https://issues.apache.org/jira/browse/SPARK-1945 > Project: Spark > Issue Type: Sub-task > Components: Documentation, MLlib >Reporter: Matei Zaharia > Labels: Starter > Fix For: 1.0.0 > > > Right now some of the Java tabs only say the following: > "All of MLlib’s methods use Java-friendly types, so you can import and call > them there the same way you do in Scala. The only caveat is that the methods > take Scala RDD objects, while the Spark Java API uses a separate JavaRDD > class. You can convert a Java RDD to a Scala one by calling .rdd() on your > JavaRDD object." > Would be nice to translate the Scala code into Java instead. > Also, a few pages (most notably the Matrix one) don't have Java examples at > all. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2158) FileAppenderSuite is not cleaning up after itself
[ https://issues.apache.org/jira/browse/SPARK-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060201#comment-14060201 ] Mark Hamstra commented on SPARK-2158: - This is fixed at 4cb33a83e0 from https://github.com/apache/spark/pull/1100 > FileAppenderSuite is not cleaning up after itself > - > > Key: SPARK-2158 > URL: https://issues.apache.org/jira/browse/SPARK-2158 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Mark Hamstra >Assignee: Mark Hamstra >Priority: Trivial > Fix For: 1.1.0 > > > FileAppenderSuite is leaving behind the file core/stdout -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2158) FileAppenderSuite is not cleaning up after itself
[ https://issues.apache.org/jira/browse/SPARK-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Hamstra resolved SPARK-2158. - Resolution: Fixed > FileAppenderSuite is not cleaning up after itself > - > > Key: SPARK-2158 > URL: https://issues.apache.org/jira/browse/SPARK-2158 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Mark Hamstra >Assignee: Mark Hamstra >Priority: Trivial > Fix For: 1.1.0 > > > FileAppenderSuite is leaving behind the file core/stdout -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2363) Clean MLlib's sample data files
[ https://issues.apache.org/jira/browse/SPARK-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060177#comment-14060177 ] Sean Owen commented on SPARK-2363: -- I made myself useful with a PR for this one -- yes good cleanup: https://github.com/apache/spark/pull/1394 > Clean MLlib's sample data files > --- > > Key: SPARK-2363 > URL: https://issues.apache.org/jira/browse/SPARK-2363 > Project: Spark > Issue Type: Task > Components: MLlib >Reporter: Xiangrui Meng >Priority: Minor > > MLlib has sample data under serveral folders: > 1) data/mllib > 2) data/ > 3) mllib/data/* > Per previous discussion with [~matei], we want to put them under `data/mllib` > and clean outdated files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1949) Servlet 2.5 vs 3.0 conflict in SBT build
[ https://issues.apache.org/jira/browse/SPARK-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1949. -- Resolution: Won't Fix Obsoleted by SBT build changes. > Servlet 2.5 vs 3.0 conflict in SBT build > > > Key: SPARK-1949 > URL: https://issues.apache.org/jira/browse/SPARK-1949 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.0.0 >Reporter: Sean Owen >Priority: Minor > > [~kayousterhout] mentioned that: > {quote} > I had some trouble compiling an application (Shark) against Spark 1.0, > where Shark had a runtime exception (at the bottom of this message) because > it couldn't find the javax.servlet classes. SBT seemed to have trouble > downloading the servlet APIs that are dependencies of Jetty (used by the > Spark web UI), so I had to manually add them to the application's build > file: > libraryDependencies += "org.mortbay.jetty" % "servlet-api" % "3.0.20100224" > Not exactly sure why this happens but thought it might be useful in case > others run into the same problem. > {quote} > This is a symptom of Servlet API conflict which we battled in the Maven > build. The resolution is to nix Servlet 2.5 and odd old Jetty / Netty 3.x > dependencies. It looks like the Hive part of the assembly in the SBT build > doesn't exclude all these entirely. > I'll open a suggested PR to band-aid the SBT build. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2158) FileAppenderSuite is not cleaning up after itself
[ https://issues.apache.org/jira/browse/SPARK-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060161#comment-14060161 ] Sean Owen commented on SPARK-2158: -- I tried to clean this up a while ago, but I think it predates your comment. I don't see this file however after running tests. Is it maybe due to an unusual termination in the test? I don't see this file created either. > FileAppenderSuite is not cleaning up after itself > - > > Key: SPARK-2158 > URL: https://issues.apache.org/jira/browse/SPARK-2158 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Mark Hamstra >Assignee: Mark Hamstra >Priority: Trivial > Fix For: 1.1.0 > > > FileAppenderSuite is leaving behind the file core/stdout -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2278) groupBy & groupByKey should support custom comparator
[ https://issues.apache.org/jira/browse/SPARK-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060158#comment-14060158 ] Sean Owen commented on SPARK-2278: -- Isn't this exactly what the first argument to groupBy and sortBy does? You define grouping and sorting on a transformation of the key instead of the key. It's not precisely what you mean but has the same effect. Maybe more importantly it matches Scala's collections API. > groupBy & groupByKey should support custom comparator > - > > Key: SPARK-2278 > URL: https://issues.apache.org/jira/browse/SPARK-2278 > Project: Spark > Issue Type: New Feature > Components: Java API >Affects Versions: 1.0.0 >Reporter: Hans Uhlig > > To maintain parity with MapReduce you should be able to specify a custom key > equality function in groupBy/groupByKey similar to sortByKey. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2354) BitSet Range Expanded when creating new one
[ https://issues.apache.org/jira/browse/SPARK-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060156#comment-14060156 ] Sean Owen commented on SPARK-2354: -- These end up with the same effect. Let's say A is created with numBits=50 and B is created with numBits=70. A will have a capacity of 64 and B will have a capacity of 128, since they internally allocate 1 and 2 longs of storage, respectively. A|B needs to accommodate at least 70 bits, yes. Whether it is created with numBits=70 (your suggestion) or numBits=128 (the current code), you end up with a capacity of 128. Nothing is being expanded needlessly; the result is the same. I think the current code is fine. > BitSet Range Expanded when creating new one > --- > > Key: SPARK-2354 > URL: https://issues.apache.org/jira/browse/SPARK-2354 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0, 1.1.0 >Reporter: Yijie Shen >Priority: Minor > > BitSet has a constructor parameter named "numBits: Int" and indicate the bit > num inside. > And also, there is a function called "capacity" which represents the long > words number to hold the bits. > When creating new BitSet,for example in '|', I thought the new created one > shouldn't be the size of longer words' length, instead, it should be the > longer set's num of bit > {code}def |(other: BitSet): BitSet = { > val newBS = new BitSet(math.max(numBits, other.numBits)) > // I know by now the numBits isn't a field > {code} > Does it have any other reason to expand the BitSet range I don't know? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2356) Exception: Could not locate executable null\bin\winutils.exe in the Hadoop
[ https://issues.apache.org/jira/browse/SPARK-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060152#comment-14060152 ] Sean Owen commented on SPARK-2356: -- This isn't specific to Spark: http://stackoverflow.com/questions/19620642/failed-to-locate-the-winutils-binary-in-the-hadoop-binary-path And if you look at when this code is called in SparkContext, it's from the hadoopRDD() method. You will certainly end up using Hadoop code if your code access Hadoop functionality, so I think it is behaving as expected. > Exception: Could not locate executable null\bin\winutils.exe in the Hadoop > --- > > Key: SPARK-2356 > URL: https://issues.apache.org/jira/browse/SPARK-2356 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Kostiantyn Kudriavtsev > > I'm trying to run some transformation on Spark, it works fine on cluster > (YARN, linux machines). However, when I'm trying to run it on local machine > (Windows 7) under unit test, I got errors (I don't use Hadoop, I'm read file > from local filesystem): > 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the > hadoop binary path > java.io.IOException: Could not locate executable null\bin\winutils.exe in the > Hadoop binaries. > at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) > at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) > at org.apache.hadoop.util.Shell.(Shell.java:326) > at org.apache.hadoop.util.StringUtils.(StringUtils.java:76) > at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) > at org.apache.hadoop.security.Groups.(Groups.java:77) > at > org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240) > at > org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255) > at > org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283) > at > org.apache.spark.deploy.SparkHadoopUtil.(SparkHadoopUtil.scala:36) > at > org.apache.spark.deploy.SparkHadoopUtil$.(SparkHadoopUtil.scala:109) > at > org.apache.spark.deploy.SparkHadoopUtil$.(SparkHadoopUtil.scala) > at org.apache.spark.SparkContext.(SparkContext.scala:228) > at org.apache.spark.SparkContext.(SparkContext.scala:97) > It's happend because Hadoop config is initialised each time when spark > context is created regardless is hadoop required or not. > I propose to add some special flag to indicate if hadoop config is required > (or start this configuration manually) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2414) Remove jquery
[ https://issues.apache.org/jira/browse/SPARK-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060151#comment-14060151 ] Sean Owen commented on SPARK-2414: -- Note that jQuery is MIT licensed. It's fine to include its source but the Spark LICENSE file needs to reference it and its license if it's kept. Take a look for the section in that file, and see http://www.apache.org/dev/licensing-howto.html Or of course removing it moots the point. > Remove jquery > - > > Key: SPARK-2414 > URL: https://issues.apache.org/jira/browse/SPARK-2414 > Project: Spark > Issue Type: Improvement > Components: Web UI >Reporter: Reynold Xin >Assignee: Reynold Xin >Priority: Minor > > SPARK-2384 introduces jquery for tooltip display. We can probably just create > a very simple javascript for tooltip instead of pulling in jquery. > https://github.com/apache/spark/pull/1314 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2442) Add a Hadoop Writable serializer
[ https://issues.apache.org/jira/browse/SPARK-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060150#comment-14060150 ] Sean Owen commented on SPARK-2442: -- I think this duplicates https://issues.apache.org/jira/browse/SPARK-2421 > Add a Hadoop Writable serializer > > > Key: SPARK-2442 > URL: https://issues.apache.org/jira/browse/SPARK-2442 > Project: Spark > Issue Type: Bug >Reporter: Hari Shreedharan > > Using data read from hadoop files in shuffles can cause exceptions with the > following stacktrace: > {code} > java.io.NotSerializableException: org.apache.hadoop.io.BytesWritable > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1181) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1541) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1506) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1175) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42) > at > org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:179) > at > org.apache.spark.scheduler.ShuffleMapTask$$anonfun$runTask$1.apply(ShuffleMapTask.scala:161) > at > org.apache.spark.scheduler.ShuffleMapTask$$anonfun$runTask$1.apply(ShuffleMapTask.scala:158) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at org.apache.spark.scheduler.Task.run(Task.scala:51) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:679) > {code} > This though seems to go away if Kyro serializer is used. I am wondering if > adding a Hadoop-writables friendly serializer makes sense as it is likely to > perform better than Kyro without registration, since Writables don't > implement Serializable - so the serialization might not be the most efficient. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-524) spark integration issue with Cloudera hadoop
[ https://issues.apache.org/jira/browse/SPARK-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060145#comment-14060145 ] Nicholas Chammas commented on SPARK-524: +1 for cleanup of issues that likely have no further action. > spark integration issue with Cloudera hadoop > > > Key: SPARK-524 > URL: https://issues.apache.org/jira/browse/SPARK-524 > Project: Spark > Issue Type: Bug >Reporter: openreserach > > Hi, > 1. I am using single EC2 instance with pre-built mesos (ami-0fcb7966) (Same > issue if I build mesos from source code in locall VM) > 2. Follow instruction on > https://github.com/mesos/spark/wiki/Running-spark-on-mesos with some tweaks. > 3. I install Cloudera cdhu5 by yum (not using pre-built hadoop due to lack of > document) > 4. ./spartk-shell.sh > import spark._ > val sc = new SparkContext("localhost:5050","passwd") > val ec2 = sc.textFile("hdfs://localhost:8020/tmp/passwd") > IF I keep val HADOOP_VERSION = "0.20.205.0" in project/SparkBuild.scala > at val file = sc.textFile("hdfs://localhost:8020/tmp/passwd") > I am getting error > Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. > (client = 61, server = 63) > IF I set val HADOOP_VERSION = "0.20.2-cdh3u5" or val HADOOP_VERSION = > "0.20.2-cdh3u3" > I am getting error at ec2.count() > ERROR spark.SimpleJob: Task 0:0 failed more than 4 times; aborting job > like the one reported at > http://mail-archives.apache.org/mod_mbox/incubator-mesos-dev/201108.mbox/%3cbd25ae7a-c9dc-4020-ad40-41c66dcaa...@eecs.berkeley.edu%3E > Please let me know if you cannot replicate this error, and give more > instruction on how Spark integrate with Cloudera Hadoop > Thanks > -QH -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2465) Use long as user / item ID for ALS
[ https://issues.apache.org/jira/browse/SPARK-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060127#comment-14060127 ] Sean Owen commented on SPARK-2465: -- https://github.com/apache/spark/pull/1393 > Use long as user / item ID for ALS > -- > > Key: SPARK-2465 > URL: https://issues.apache.org/jira/browse/SPARK-2465 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.0.1 >Reporter: Sean Owen >Priority: Minor > > I'd like to float this for consideration: use longs instead of ints for user > and product IDs in the ALS implementation. > The main reason for is that identifiers are not generally numeric at all, and > will be hashed to an integer. (This is a separate issue.) Hashing to 32 bits > means collisions are likely after hundreds of thousands of users and items, > which is not unrealistic. Hashing to 64 bits pushes this back to billions. > It would also mean numeric IDs that happen to be larger than the largest int > can be used directly as identifiers. > On the downside of course: 8 bytes instead of 4 bytes of memory used per > Rating. > Thoughts? I will post a PR so as to show what the change would be. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2465) Use long as user / item ID for ALS
Sean Owen created SPARK-2465: Summary: Use long as user / item ID for ALS Key: SPARK-2465 URL: https://issues.apache.org/jira/browse/SPARK-2465 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.0.1 Reporter: Sean Owen Priority: Minor I'd like to float this for consideration: use longs instead of ints for user and product IDs in the ALS implementation. The main reason for is that identifiers are not generally numeric at all, and will be hashed to an integer. (This is a separate issue.) Hashing to 32 bits means collisions are likely after hundreds of thousands of users and items, which is not unrealistic. Hashing to 64 bits pushes this back to billions. It would also mean numeric IDs that happen to be larger than the largest int can be used directly as identifiers. On the downside of course: 8 bytes instead of 4 bytes of memory used per Rating. Thoughts? I will post a PR so as to show what the change would be. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2398) Trouble running Spark 1.0 on Yarn
[ https://issues.apache.org/jira/browse/SPARK-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060113#comment-14060113 ] Mridul Muralidharan commented on SPARK-2398: As discussed in the PR, I am attempting to list the various factors which contribute to overhead. Note, this is not exhaustive (yet) - please add more to this JIRA - so that when we are reasonably sure, we can model the expected overhead based on these factors. These factors are typically off-heap - since anything within heap is budgetted for by Xmx - and enforced by VM : and so should ideally (not practically always, see gc overheads) not exceed the Xmx value 1) 256 KB per socket accepted via ConnectionManager for inter-worker comm (setReceiveBufferSize) Typically, there will be (numExecutor - 1) number of sockets open. 2) 128 KB per socket for writing output to dfs. For reads, this does not seem to be configured - and should be 8k per socket iirc. Typically 1 per executor at a given point in time ? 3) 256k for each akka socket for send/receive buffer. One per worker ? (to talk to master) - so 512kb ? Any other use of akka ? 4) If I am not wrong, netty might allocate multiple "spark.akka.frameSize" sized direct buffer. There might be a few of these allocated and pooled/reused. I did not go in detail into netty code though. If someone else with more knowhow can clarify, that would be great ! Default size of 10mb for spark.akka.frameSize 5) The default size of the assembled spark jar is about 12x mb (and changing) - though not all classes get loaded, the overhead would be some function of this. The actual footprint would be higher than the on-disk size. IIRC this is outside of the heap - [~sowen], any comments on this ? I have not looked into these in like 10 years now ! 6) Per thread (Xss) overhead of 1mb (for 64bit vm). Last I recall, we have about 220 odd threads - not sure if this was at the master or on the workers. Ofcourse, this is dependent on the various threadpools we use (io, computation, etc), akka and netty config, etc. 7) Disk read overhead. Thanks for [~pwendell]'s fix, atleast for small files, the overhead is not too high - since we do not mmap files but directly read them. But for anything larger than 8kb (default), we use memory mapped buffers. The actual overhead depends on the number of files opened for read via DiskStore - and the entire file contents get mmap'ed into virt mem. Note that there is some non-virt-mem overhead also at native level for these buffers. The actual number of files opened should be carefully tracked to understand the effect of this on spark overhead : since this aspect is changing a lot off late. Impact is on shuffle, disk persisted rdd, among others. The actual value would be application dependent (how large the data is !) 8) The overhead introduced by VM not being able to reclaim memory completely (the cost of moving data vs amount of space reclaimed). Ideally, this should be low - but would be dependent on the heap space, collector used, among other things. I am not very knowledgable of the recent advances in gc collectors, so I hesitate to put a number to this. I am sure this is not an exhaustive list, please do add to this. In our case specifically, and [~tgraves] could add more, the number of containers can be high (300+ is easily possible), memory per container is modest (8gig usually). To add details of observed overhead patterns (from the PR discussion) - a) I have had inhouse GBDT impl run without customizing overhead (so default of 384 mb) with 12gb container and 22 nodes on reasonably large dataset. b) I have had to customize overhead to 1.7gb for collaborative filtering with 8gb container and 300 nodes (on a fairly large dataset). c) I have had to minimally customize overhead to do inhouse QR factorization of a 50k x 50k distributed dense matrix on 45 nodes at 12 gb each (this was incorrectly specified in the PR discussion). > Trouble running Spark 1.0 on Yarn > -- > > Key: SPARK-2398 > URL: https://issues.apache.org/jira/browse/SPARK-2398 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Nishkam Ravi > > Trouble running workloads in Spark-on-YARN cluster mode for Spark 1.0. > For example: SparkPageRank when run in standalone mode goes through without > any errors (tested for up to 30GB input dataset on a 6-node cluster). Also > runs fine for a 1GB dataset in yarn cluster mode. Starts to choke (in yarn > cluster mode) as the input data size is increased. Confirmed for 16GB input > dataset. > The same workload runs fine with Spark 0.9 in both standalone and yarn > cluster mode (for up to 30 GB input dataset on a 6-node cluster). > Commandline used: > (/opt/cloudera/parcels/CDH/lib/spark
[jira] [Commented] (SPARK-524) spark integration issue with Cloudera hadoop
[ https://issues.apache.org/jira/browse/SPARK-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060070#comment-14060070 ] Sean Owen commented on SPARK-524: - Can I ask a meta-question? This JIRA is an example, but just one. I see hundreds of JIRAs that likely have no further action. Some are likely obsoleted by time and subsequent changes, like this one -- CDH integration is much different now and presumably fixes this. Some are feature requests or changes that de facto don't have support and therefore won't be committed. These seem like they should be closed, for clarity. Bugs are riskier to close in case they identify a real issue that still exists. Is there any momentum for, or anything I can do, to help clean up things like this just to start? > spark integration issue with Cloudera hadoop > > > Key: SPARK-524 > URL: https://issues.apache.org/jira/browse/SPARK-524 > Project: Spark > Issue Type: Bug >Reporter: openreserach > > Hi, > 1. I am using single EC2 instance with pre-built mesos (ami-0fcb7966) (Same > issue if I build mesos from source code in locall VM) > 2. Follow instruction on > https://github.com/mesos/spark/wiki/Running-spark-on-mesos with some tweaks. > 3. I install Cloudera cdhu5 by yum (not using pre-built hadoop due to lack of > document) > 4. ./spartk-shell.sh > import spark._ > val sc = new SparkContext("localhost:5050","passwd") > val ec2 = sc.textFile("hdfs://localhost:8020/tmp/passwd") > IF I keep val HADOOP_VERSION = "0.20.205.0" in project/SparkBuild.scala > at val file = sc.textFile("hdfs://localhost:8020/tmp/passwd") > I am getting error > Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. > (client = 61, server = 63) > IF I set val HADOOP_VERSION = "0.20.2-cdh3u5" or val HADOOP_VERSION = > "0.20.2-cdh3u3" > I am getting error at ec2.count() > ERROR spark.SimpleJob: Task 0:0 failed more than 4 times; aborting job > like the one reported at > http://mail-archives.apache.org/mod_mbox/incubator-mesos-dev/201108.mbox/%3cbd25ae7a-c9dc-4020-ad40-41c66dcaa...@eecs.berkeley.edu%3E > Please let me know if you cannot replicate this error, and give more > instruction on how Spark integrate with Cloudera Hadoop > Thanks > -QH -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2382) build error:
[ https://issues.apache.org/jira/browse/SPARK-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060045#comment-14060045 ] Mukul Jain commented on SPARK-2382: --- Hi Sean, Well I thought it required more than proxy setting fix. If I have a chance then I will try to reproduce it next week. If you want to close in the meanwhile then that is fine I am not blocked anymore. Thanks Sent from my iPhone > build error: > - > > Key: SPARK-2382 > URL: https://issues.apache.org/jira/browse/SPARK-2382 > Project: Spark > Issue Type: Question > Components: Build >Affects Versions: 1.0.0 > Environment: Ubuntu 12.0.4 precise. > spark@ubuntu-cdh5-spark:~/spark-1.0.0$ mvn -version > Apache Maven 3.0.4 > Maven home: /usr/share/maven > Java version: 1.6.0_31, vendor: Sun Microsystems Inc. > Java home: /usr/lib/jvm/j2sdk1.6-oracle/jre > Default locale: en_US, platform encoding: UTF-8 > OS name: "linux", version: "3.11.0-15-generic", arch: "amd64", family: "unix" >Reporter: Mukul Jain > Labels: newbie > > Unable to build. maven can't download dependency .. checked my http_proxy and > https_proxy setting they are working fine. Other http and https dependencies > were downloaded fine.. build process gets stuck always at this repo. manually > down loading also fails and receive an exception. > [INFO] > > [INFO] Building Spark Project External MQTT 1.0.0 > [INFO] > > Downloading: > https://repository.apache.org/content/repositories/releases/org/eclipse/paho/mqtt-client/0.4.0/mqtt-client-0.4.0.pom > Jul 6, 2014 4:53:26 PM org.apache.commons.httpclient.HttpMethodDirector > executeWithRetry > INFO: I/O exception (java.net.ConnectException) caught when processing > request: Connection timed out > Jul 6, 2014 4:53:26 PM org.apache.commons.httpclient.HttpMethodDirector > executeWithRetry > INFO: Retrying request -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2382) build error:
[ https://issues.apache.org/jira/browse/SPARK-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060036#comment-14060036 ] Sean Owen commented on SPARK-2382: -- It sounds like it was an issue with your proxy then, no? That is indeed common, but it is not related to Spark. > build error: > - > > Key: SPARK-2382 > URL: https://issues.apache.org/jira/browse/SPARK-2382 > Project: Spark > Issue Type: Question > Components: Build >Affects Versions: 1.0.0 > Environment: Ubuntu 12.0.4 precise. > spark@ubuntu-cdh5-spark:~/spark-1.0.0$ mvn -version > Apache Maven 3.0.4 > Maven home: /usr/share/maven > Java version: 1.6.0_31, vendor: Sun Microsystems Inc. > Java home: /usr/lib/jvm/j2sdk1.6-oracle/jre > Default locale: en_US, platform encoding: UTF-8 > OS name: "linux", version: "3.11.0-15-generic", arch: "amd64", family: "unix" >Reporter: Mukul Jain > Labels: newbie > > Unable to build. maven can't download dependency .. checked my http_proxy and > https_proxy setting they are working fine. Other http and https dependencies > were downloaded fine.. build process gets stuck always at this repo. manually > down loading also fails and receive an exception. > [INFO] > > [INFO] Building Spark Project External MQTT 1.0.0 > [INFO] > > Downloading: > https://repository.apache.org/content/repositories/releases/org/eclipse/paho/mqtt-client/0.4.0/mqtt-client-0.4.0.pom > Jul 6, 2014 4:53:26 PM org.apache.commons.httpclient.HttpMethodDirector > executeWithRetry > INFO: I/O exception (java.net.ConnectException) caught when processing > request: Connection timed out > Jul 6, 2014 4:53:26 PM org.apache.commons.httpclient.HttpMethodDirector > executeWithRetry > INFO: Retrying request -- This message was sent by Atlassian JIRA (v6.2#6252)