Re: [VOTE] Apache Spark 2.1.0 (RC2)

2016-12-08 Thread Prashant Sharma
I am getting 404 for Link
https://repository.apache.org/content/repositories/orgapachespark-1217.

--Prashant


On Fri, Dec 9, 2016 at 10:43 AM, Michael Allman 
wrote:

> I believe https://github.com/apache/spark/pull/16122 needs to be included
> in Spark 2.1. It's a simple bug fix to some functionality that is
> introduced in 2.1. Unfortunately, it's been manually verified only. There's
> no unit test that covers it, and building one is far from trivial.
>
> Michael
>
>
>
>
> On Dec 8, 2016, at 12:39 AM, Reynold Xin  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.1.0. The vote is open until Sun, December 11, 2016 at 1:00 PT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.0
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.0-rc2 (080717497365b83bc202ab16812ced
> 93eb1ea7bd)
>
> List of JIRA tickets resolved are:  https://issues.apache.
> org/jira/issues/?jql=project%20%3D%20SPARK%20AND%
> 20fixVersion%20%3D%202.1.0
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc2-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1217
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc2-docs/
>
>
> (Note that the docs and staging repo are still being uploaded and will be
> available soon)
>
>
> ===
> How can I help test this release?
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> ===
> What should happen to JIRA tickets still targeting 2.1.0?
> ===
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>
>
>


Re: Issue in using DenseVector in RowMatrix, error could be due to ml and mllib package changes

2016-12-08 Thread Nick Pentreath
Yes most likely due to hashing tf returns ml vectors while you need mllib
vectors for row matrix.

I'd recommend using the vector conversion utils (I think in
mllib.linalg.Vectors but I'm on mobile right now so can't recall exactly).
There are until methods for converting single vectors as well as vector
rows of DF. Check the mllib user guide for 2.0 for details.
On Fri, 9 Dec 2016 at 04:42, satyajit vegesna 
wrote:

> Hi All,
>
> PFB code.
>
>
> import org.apache.spark.ml.feature.{HashingTF, IDF}
> import org.apache.spark.ml.linalg.SparseVector
> import org.apache.spark.mllib.linalg.distributed.RowMatrix
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.{SparkConf, SparkContext}
>
> /**
>   * Created by satyajit on 12/7/16.
>   */
> object DIMSUMusingtf extends App {
>
>   val conf = new SparkConf()
> .setMaster("local[1]")
> .setAppName("testColsim")
>   val sc = new SparkContext(conf)
>   val spark = SparkSession
> .builder
> .appName("testColSim").getOrCreate()
>
>   import org.apache.spark.ml.feature.Tokenizer
>
>   val sentenceData = spark.createDataFrame(Seq(
> (0, "Hi I heard about Spark"),
> (0, "I wish Java could use case classes"),
> (1, "Logistic regression models are neat")
>   )).toDF("label", "sentence")
>
>   val tokenizer = new 
> Tokenizer().setInputCol("sentence").setOutputCol("words")
>
>   val wordsData = tokenizer.transform(sentenceData)
>
>
>   val hashingTF = new HashingTF()
> .setInputCol("words").setOutputCol("rawFeatures").setNumFeatures(20)
>
>   val featurizedData = hashingTF.transform(wordsData)
>
>
>   val idf = new IDF().setInputCol("rawFeatures").setOutputCol("features")
>   val idfModel = idf.fit(featurizedData)
>   val rescaledData = idfModel.transform(featurizedData)
>   rescaledData.show()
>   rescaledData.select("features", "label").take(3).foreach(println)
>   val check = rescaledData.select("features")
>
>   val row = check.rdd.map(row => row.getAs[SparseVector]("features"))
>
>   val mat = new RowMatrix(row) //i am basically trying to use Dense.vector as 
> a direct input to
>
> rowMatrix, but i get an error that RowMatrix Cannot resolve constructor
>
>   row.foreach(println)
> }
>
> Any help would be appreciated.
>
> Regards,
> Satyajit.
>
>
>
>


Re: [VOTE] Apache Spark 2.1.0 (RC2)

2016-12-08 Thread Michael Allman
I believe https://github.com/apache/spark/pull/16122 
 needs to be included in Spark 2.1. 
It's a simple bug fix to some functionality that is introduced in 2.1. 
Unfortunately, it's been manually verified only. There's no unit test that 
covers it, and building one is far from trivial.

Michael



> On Dec 8, 2016, at 12:39 AM, Reynold Xin  wrote:
> 
> Please vote on releasing the following candidate as Apache Spark version 
> 2.1.0. The vote is open until Sun, December 11, 2016 at 1:00 PT and passes if 
> a majority of at least 3 +1 PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Spark 2.1.0
> [ ] -1 Do not release this package because ...
> 
> 
> To learn more about Apache Spark, please see http://spark.apache.org/ 
> 
> 
> The tag to be voted on is v2.1.0-rc2 
> (080717497365b83bc202ab16812ced93eb1ea7bd)
> 
> List of JIRA tickets resolved are:  
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
>  
> 
> 
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc2-bin/ 
> 
> 
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc 
> 
> 
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1217 
> 
> 
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc2-docs/ 
> 
> 
> 
> (Note that the docs and staging repo are still being uploaded and will be 
> available soon)
> 
> 
> ===
> How can I help test this release?
> ===
> If you are a Spark user, you can help us test this release by taking an 
> existing Spark workload and running on this release candidate, then reporting 
> any regressions.
> 
> ===
> What should happen to JIRA tickets still targeting 2.1.0?
> ===
> Committers should look at those and triage. Extremely important bug fixes, 
> documentation, and API tweaks that impact compatibility should be worked on 
> immediately. Everything else please retarget to 2.1.1 or 2.2.0.



Re: how can I set the log configuration file for spark history server ?

2016-12-08 Thread Don Drake
You can update $SPARK_HOME/spark-env.sh by setting the environment
variable SPARK_HISTORY_OPTS.

See
http://spark.apache.org/docs/latest/monitoring.html#spark-configuration-options
for options (spark.history.fs.logDirectory) you can set.

There is log rotation built in (by time, not size) to the history server,
you need to enable/configure it.

Hope that helps.

-Don

On Thu, Dec 8, 2016 at 9:20 PM, John Fang 
wrote:

> ./start-history-server.sh
> starting org.apache.spark.deploy.history.HistoryServer,
> logging to /home/admin/koala/data/versions/0/SPARK/2.0.2/
> spark-2.0.2-bin-hadoop2.6/logs/spark-admin-org.apache.
> spark.deploy.history.HistoryServer-1-v069166214.sqa.zmf.out
>
> Then the history will print all log to the XXX.sqa.zmf.out, so i can't
> limit the file max size.  I want limit the size of the log file
>



-- 
Donald Drake
Drake Consulting
http://www.drakeconsulting.com/
https://twitter.com/dondrake 
800-733-2143


how can I set the log configuration file for spark history server ?

2016-12-08 Thread John Fang
./start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to 
/home/admin/koala/data/versions/0/SPARK/2.0.2/spark-2.0.2-bin-hadoop2.6/logs/spark-admin-org.apache.spark.deploy.history.HistoryServer-1-v069166214.sqa.zmf.out
Then the history will print all log to the XXX.sqa.zmf.out, so i can't limit 
the file max size.  I want limit the size of the log file

Issue in using DenseVector in RowMatrix, error could be due to ml and mllib package changes

2016-12-08 Thread satyajit vegesna
Hi All,

PFB code.


import org.apache.spark.ml.feature.{HashingTF, IDF}
import org.apache.spark.ml.linalg.SparseVector
import org.apache.spark.mllib.linalg.distributed.RowMatrix
import org.apache.spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}

/**
  * Created by satyajit on 12/7/16.
  */
object DIMSUMusingtf extends App {

  val conf = new SparkConf()
.setMaster("local[1]")
.setAppName("testColsim")
  val sc = new SparkContext(conf)
  val spark = SparkSession
.builder
.appName("testColSim").getOrCreate()

  import org.apache.spark.ml.feature.Tokenizer

  val sentenceData = spark.createDataFrame(Seq(
(0, "Hi I heard about Spark"),
(0, "I wish Java could use case classes"),
(1, "Logistic regression models are neat")
  )).toDF("label", "sentence")

  val tokenizer = new Tokenizer().setInputCol("sentence").setOutputCol("words")

  val wordsData = tokenizer.transform(sentenceData)


  val hashingTF = new HashingTF()
.setInputCol("words").setOutputCol("rawFeatures").setNumFeatures(20)

  val featurizedData = hashingTF.transform(wordsData)


  val idf = new IDF().setInputCol("rawFeatures").setOutputCol("features")
  val idfModel = idf.fit(featurizedData)
  val rescaledData = idfModel.transform(featurizedData)
  rescaledData.show()
  rescaledData.select("features", "label").take(3).foreach(println)
  val check = rescaledData.select("features")

  val row = check.rdd.map(row => row.getAs[SparseVector]("features"))

  val mat = new RowMatrix(row) //i am basically trying to use
Dense.vector as a direct input to

rowMatrix, but i get an error that RowMatrix Cannot resolve constructor

  row.foreach(println)
}

Any help would be appreciated.

Regards,
Satyajit.


Fwd: Question about SPARK-11374 (skip.header.line.count)

2016-12-08 Thread Dongjoon Hyun
+dev

I forget to add @user.

Dongjoon.

-- Forwarded message -
From: Dongjoon Hyun 
Date: Thu, Dec 8, 2016 at 16:00
Subject: Question about SPARK-11374 (skip.header.line.count)
To: 


Hi, All.



Could you give me some opinion?



There is an old SPARK issue, SPARK-11374, about removing header lines from
text file.

Currently, Spark supports removing CSV header lines by the following way.



```

scala> spark.read.option("header","true").csv("/data").show

+---+---+

| c1| c2|

+---+---+

|  1|  a|

|  2|  b|

+---+---+

```



In SQL world, we can support that like the Hive way,
`skip.header.line.count`.



```

scala> sql("CREATE TABLE t1 (id INT, value VARCHAR(10)) ROW FORMAT
DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/data'
TBLPROPERTIES('skip.header.line.count'='1')")

scala> sql("SELECT * FROM t1").show

+---+-+

| id|value|

+---+-+

|  1|a|

|  2|b|

+---+-+

```



Although I made a PR for this based on the JIRA issue, I want to know this
is really needed feature.

Is it need for your use cases? Or, it's enough for you to remove them in a
preprocessing stage.

If this is too old and not proper in these days, I'll close the PR and JIRA
issue as WON'T FIX.



Thank you for all in advance!



Bests,

Dongjoon.



-

To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


Question about SPARK-11374 (skip.header.line.count)

2016-12-08 Thread Dongjoon Hyun
Hi, All.

Could you give me some opinion?

There is an old SPARK issue, SPARK-11374, about removing header lines from text 
file.
Currently, Spark supports removing CSV header lines by the following way.

```
scala> spark.read.option("header","true").csv("/data").show
+---+---+
| c1| c2|
+---+---+
|  1|  a|
|  2|  b|
+---+---+
```

In SQL world, we can support that like the Hive way, `skip.header.line.count`.

```
scala> sql("CREATE TABLE t1 (id INT, value VARCHAR(10)) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/data' 
TBLPROPERTIES('skip.header.line.count'='1')")
scala> sql("SELECT * FROM t1").show
+---+-+
| id|value|
+---+-+
|  1|a|
|  2|b|
+---+-+
```

Although I made a PR for this based on the JIRA issue, I want to know this is 
really needed feature.
Is it need for your use cases? Or, it's enough for you to remove them in a 
preprocessing stage.
If this is too old and not proper in these days, I'll close the PR and JIRA 
issue as WON'T FIX.

Thank you for all in advance!

Bests,
Dongjoon.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Apache Spark 2.1.0 (RC2)

2016-12-08 Thread Shivaram Venkataraman
+0

I am not sure how much of a problem this is but the pip packaging
seems to have changed the size of the hadoop-2.7 artifact. As you can
see in http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc2-bin/,
the Hadoop 2.7 build is 359M almost double the size of the other
Hadoop versions.

This comes from the fact that we build our pip package using the
Hadoop 2.7 profile [1] and the pip package is contained inside this
tarball. The fix for this is to exclude the pip package from the
distribution in [2]

Thanks
Shivaram

[1] 
https://github.com/apache/spark/blob/202fcd21ce01393fa6dfaa1c2126e18e9b85ee96/dev/create-release/release-build.sh#L242
[2] 
https://github.com/apache/spark/blob/202fcd21ce01393fa6dfaa1c2126e18e9b85ee96/dev/make-distribution.sh#L240

On Thu, Dec 8, 2016 at 12:39 AM, Reynold Xin  wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.1.0. The vote is open until Sun, December 11, 2016 at 1:00 PT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.0
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.0-rc2
> (080717497365b83bc202ab16812ced93eb1ea7bd)
>
> List of JIRA tickets resolved are:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc2-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1217
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc2-docs/
>
>
> (Note that the docs and staging repo are still being uploaded and will be
> available soon)
>
>
> ===
> How can I help test this release?
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> ===
> What should happen to JIRA tickets still targeting 2.1.0?
> ===
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.1 or 2.2.0.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Reduce memory usage of UnsafeInMemorySorter

2016-12-08 Thread Kazuaki Ishizaki
The line where I pointed out would work correctly. This is because a type 
of this division is double. d2i correctly handles overflow cases.

Kazuaki Ishizaki



From:   Nicholas Chammas 
To: Kazuaki Ishizaki/Japan/IBM@IBMJP, Reynold Xin 

Cc: Spark dev list 
Date:   2016/12/08 10:56
Subject:Re: Reduce memory usage of UnsafeInMemorySorter



Unfortunately, I don't have a repro, and I'm only seeing this at scale. 
But I was able to get around the issue by fiddling with the distribution 
of my data before asking GraphFrames to process it. (I think that's where 
the error was being thrown from.)

On Wed, Dec 7, 2016 at 7:32 AM Kazuaki Ishizaki  
wrote:
I do not have a repro, too.
But, when I took a quick browse at the file 'UnsafeInMemorySort.java', I 
am afraid about the similar cast issue like 
https://issues.apache.org/jira/browse/SPARK-18458at the following line.
https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java#L156


Regards,
Kazuaki Ishizaki



From:Reynold Xin 
To:Nicholas Chammas 
Cc:Spark dev list 
Date:2016/12/07 14:27
Subject:Re: Reduce memory usage of UnsafeInMemorySorter



This is not supposed to happen. Do you have a repro?


On Tue, Dec 6, 2016 at 6:11 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:
[Re-titling thread.]
OK, I see that the exception from my original email is being triggered 
from this part of UnsafeInMemorySorter:
https://github.com/apache/spark/blob/v2.0.2/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java#L209-L212

So I can ask a more refined question now: How can I ensure that 
UnsafeInMemorySorterhas room to insert new records? In other words, how 
can I ensure that hasSpaceForAnotherRecord()returns a true value?
Do I need:
More, smaller partitions?
More memory per executor?
Some Java or Spark option enabled?
etc.
I’m running Spark 2.0.2 on Java 7 and YARN. Would Java 8 help here? 
(Unfortunately, I cannot upgrade at this time, but it would be good to 
know regardless.)
This is morphing into a user-list question, so accept my apologies. Since 
I can’t find any information anywhere else about this, and the question 
is about internals like UnsafeInMemorySorter, I hope this is OK here.
Nick
On Mon, Dec 5, 2016 at 9:11 AM Nicholas Chammas nicholas.cham...@gmail.com
wrote:
I was testing out a new project at scale on Spark 2.0.2 running on YARN, 
and my job failed with an interesting error message:
TaskSetManager: Lost task 37.3 in stage 31.0 (TID 10684, server.host.name
): java.lang.IllegalStateException: There is no space for new record
05:27:09.573 at 
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.insertRecord(UnsafeInMemorySorter.java:211)
05:27:09.574 at 
org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:127)
05:27:09.574 at 
org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:244)
05:27:09.575 at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown
 
Source)
05:27:09.575 at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 
Source)
05:27:09.576 at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
05:27:09.576 at 
org.apache.spark.sql.execution.WholeStageCodegenExec$anonfun$8$anon$1.hasNext(WholeStageCodegenExec.scala:370)
05:27:09.577 at 
scala.collection.Iterator$anon$11.hasNext(Iterator.scala:408)
05:27:09.577 at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
05:27:09.577 at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
05:27:09.578 at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
05:27:09.578 at org.apache.spark.scheduler.Task.run(Task.scala:86)
05:27:09.578 at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
05:27:09.579 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
05:27:09.579 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
05:27:09.579 at java.lang.Thread.run(Thread.java:745)

I’ve never seen this before, and searching on Google/DDG/JIRA doesn’t 
yield any results. There are no other errors coming from that executor, 
whether related to memory, storage space, or otherwise.
Could this be a bug? If so, how would I narrow down the source? Otherwise, 
how might I work around the issue?
Nick
?
?






Re: Publishing of the Spectral LDA model on Spark Packages

2016-12-08 Thread François Garillot
This is very cool ! Thanks a lot for making this more accessible !

Best,
-- 
FG

On Wed, Dec 7, 2016 at 11:46 PM Jencir Lee  wrote:

> Hello,
>
> We just published the Spectral LDA model on Spark Packages. It’s an
> alternative approach to the LDA modelling based on tensor decompositions.
> We first build the 2nd, 3rd-moment tensors from empirical word counts, then
> orthogonalise them and perform decomposition on the 3rd-moment tensor. The
> convergence is guaranteed by theory, in contrast to most current
> approaches. We achieve comparable log-perplexity in much shorter running
> time.
>
> You could find the package at
>
> https://spark-packages.org/package/FurongHuang/SpectralLDA-TensorSpark
>
>
> We’d welcome any thoughts or feedback on it.
>
> Thanks very much,
>
> Furong Huang
> Jencir Lee
> Anima Anandkumar
>


Re: [VOTE] Apache Spark 2.1.0 (RC1)

2016-12-08 Thread Reynold Xin
This vote is closed in favor of rc2.


On Mon, Nov 28, 2016 at 5:25 PM, Reynold Xin  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.1.0. The vote is open until Thursday, December 1, 2016 at 18:00 UTC and
> passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.0
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.0-rc1 (80aabc0bd33dc5661a90133156247e
> 7a8c1bf7f5)
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc1-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1216/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc1-docs/
>
>
> ===
> How can I help test this release?
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> ===
> What should happen to JIRA tickets still targeting 2.1.0?
> ===
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>
>
>


[Spark SQL]: How does Spark HiveThriftServer handle idle sessions ?

2016-12-08 Thread Moriarty
In org.apache.spark.sql.hive.thriftserver.SparkSQLSessionManager.init(),
SparkSQLSessionManager inits “backgroundOperationPool” by creating a
ThreadPool directly instead of invoking
super.createBackgroundOperationPool(). 

This results in the idle-session-check-thread in
org.apache.hive.service.cli.session.SessionManager not created.

So does Spark HiveThriftServer have any mechanism to clean idle sessions, or
it is a bug that we should fix ?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-SQL-How-does-Spark-HiveThriftServer-handle-idle-sessions-tp20173.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: modifications to ALS.scala

2016-12-08 Thread Georg Heiler
You can write some code e.g. A custom estimator transformer in sparks
namespace.
http://stackoverflow.com/a/40785438/2587904 might help you get started.
Be aware that using private e.g. Spark internal api might be subjected to
change from release to release.

You definitely will require spark -mllib dependency.

Currently for my usage I was not required to build a separate version of
mllib.
harini  schrieb am Do. 8. Dez. 2016 um 00:23:

> I am new to development with spark, how do I do that? Can I write up a
> custom
> implementation under package org.apache.spark.ml.recommendation, and
> specify
> "spark-mllib" along with others as a library dependency?
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/modifications-to-ALS-scala-tp20167p20169.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>