subject:"\"\\\[GitHub\\\] spark pull request\\\: \\\[SPARK\\\-11069\\\] \\\[ML\\\] Add RegexTokenizer option t...\""

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/9092


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-09 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-155248366
  
merging with master, branch-1.6
Thank you for the PR!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-155243776
  
**[Test build #45439 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45439/consoleFull)**
 for PR 9092 at commit 
[`2663cbf`](https://github.com/apache/spark/commit/2663cbf213548c0631e88d886e8010f4dcac163c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-155244180
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-09 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-155239682
  
LGTM pending tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-155231560
  
**[Test build #45439 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45439/consoleFull)**
 for PR 9092 at commit 
[`2663cbf`](https://github.com/apache/spark/commit/2663cbf213548c0631e88d886e8010f4dcac163c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-155230865
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-155230847
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-09 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/9092#discussion_r44311675
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala 
---
@@ -100,10 +100,25 @@ class RegexTokenizer(override val uid: String)
   /** @group getParam */
   def getPattern: String = $(pattern)
 
-  setDefault(minTokenLength -> 1, gaps -> true, pattern -> "\\s+")
+  /**
+   * Indicates whether to convert all characters to lowercase before 
tokenizing.
+   * Default: false
--- End diff --

default needs to be updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-09 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-155150009
  
Looks good except that one outdated doc line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-154907032
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-154906966
  
**[Test build #45328 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45328/consoleFull)**
 for PR 9092 at commit 
[`43fd8e9`](https://github.com/apache/spark/commit/43fd8e954b53599ece65c5ee48f24c9b036a75a6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-154903263
  
**[Test build #45328 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45328/consoleFull)**
 for PR 9092 at commit 
[`43fd8e9`](https://github.com/apache/spark/commit/43fd8e954b53599ece65c5ee48f24c9b036a75a6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-154902998
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-154902986
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-154861500
  
**[Test build #45314 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45314/consoleFull)**
 for PR 9092 at commit 
[`0c07366`](https://github.com/apache/spark/commit/0c07366ea6d397859b5761fd67f31db851834629).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-154861510
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-154859228
  
**[Test build #45314 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45314/consoleFull)**
 for PR 9092 at commit 
[`0c07366`](https://github.com/apache/spark/commit/0c07366ea6d397859b5761fd67f31db851834629).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-154858828
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-154858819
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread hhbyyh

Github user hhbyyh commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-154832573
  
Yes, I agree.
1. Tokenizer and RegexTokenizer should have consistent behavior.
2. Whether to set toLower to true is a matter of preference. I assume for 
ML applications it's more common to have toLower as true. 
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-07 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-154743585
  
I'm wondering now if we should set it to convert to lowercase by default.  
I know it breaks behavior, but otherwise, we'll introduce an inconsistency in 
the API (between Tokenizer and RegexTokenizer) which will be around for a long 
time.  What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-07 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/9092#discussion_r44216481
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/TokenizerSuite.scala ---
@@ -69,6 +69,18 @@ class RegexTokenizerSuite extends SparkFunSuite with 
MLlibTestSparkContext {
 ))
 testRegexTokenizer(tokenizer2, dataset2)
   }
+
+  test("RegexTokenizer with toLowercase true"){
--- End diff --

style: space before brace at end of line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-07 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/9092#discussion_r44216479
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala 
---
@@ -100,10 +100,25 @@ class RegexTokenizer(override val uid: String)
   /** @group getParam */
   def getPattern: String = $(pattern)
 
-  setDefault(minTokenLength -> 1, gaps -> true, pattern -> "\\s+")
+  /**
+   * Indicates whether to convert all characters to lowercase before 
tokenizing.
+   * Default: false
+   * @group param
+   */
+  val toLowercase: BooleanParam = new BooleanParam(this, "toLowercase",
--- End diff --

final val


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-10-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-147640440
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-10-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-147640443
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43630/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-10-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-147639962
  
  [Test build #43630 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43630/console)
 for   PR 9092 at commit 
[`ce09ef5`](https://github.com/apache/spark/commit/ce09ef532f2ec633e508840097fd0ac1b5285284).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-10-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-147628224
  
  [Test build #43630 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43630/consoleFull)
 for   PR 9092 at commit 
[`ce09ef5`](https://github.com/apache/spark/commit/ce09ef532f2ec633e508840097fd0ac1b5285284).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-10-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-147627765
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-10-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9092#issuecomment-147627734
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-10-13 Thread hhbyyh

GitHub user hhbyyh opened a pull request:

https://github.com/apache/spark/pull/9092

[SPARK-11069] [ML] Add RegexTokenizer option to convert to lowercase

jira: https://issues.apache.org/jira/browse/SPARK-11069 
quotes from jira: 
Tokenizer converts strings to lowercase automatically, but RegexTokenizer 
does not. It would be nice to add an option to RegexTokenizer to convert to 
lowercase. Proposal:
call the Boolean Param "toLowercase"
set default to false (so behavior does not change)

Actually sklearn converts to lowercase before tokenizing too

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hhbyyh/spark tokenLower

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9092.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9092


commit ce09ef532f2ec633e508840097fd0ac1b5285284
Author: Yuhao Yang 
Date:   2015-10-13T07:14:55Z

add tolowercase to regexTokenizer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

31 matches

Site Navigation

Mail list logo

Footer information