Revision: 17247
          http://sourceforge.net/p/gate/code/17247
Author:   markagreenwood
Date:     2014-01-24 17:07:59 +0000 (Fri, 24 Jan 2014)
Log Message:
-----------
updated the jape for finding hashtags so that it is closer to how twitter 
determines what a hashtag is

Modified Paths:
--------------
    gate/trunk/plugins/Twitter/resources/tokeniser/twitter.jape

Modified: gate/trunk/plugins/Twitter/resources/tokeniser/twitter.jape
===================================================================
--- gate/trunk/plugins/Twitter/resources/tokeniser/twitter.jape 2014-01-24 
16:43:33 UTC (rev 17246)
+++ gate/trunk/plugins/Twitter/resources/tokeniser/twitter.jape 2014-01-24 
17:07:59 UTC (rev 17247)
@@ -17,19 +17,20 @@
 // flag RT/MT/via @/by @ & via specially? (probably better to handle later)
 
 
-// Syntax of username & hashtag according to
+// Syntax of usernames
 // http://sentiment.christopherpotts.net/tokenizing.html#twitter
 // @+[\w_]+
-// \#+[\w_]+[\w_\-]*[\w_]+
 
+// and hashtags according to
+// 
https://support.twitter.com/articles/370610-my-hashtags-or-replies-aren-t-working
+// can include numbers but can't be just a number
+// punctuation breaks the tag; apart from underscores appear to be allowed
+
 Rule: Hashtag
-( ({Token.string == "#"})
-  ({Token.kind=="word"} | {Token.string=="_"}| {Token.string=="-"})
-  (
-    ({Token.kind=="word"} | {Token.kind=="number"} | {Token.string=="_"}| 
{Token.string=="-"})[0,5]
-    ({Token.kind=="word"} | {Token.kind=="number"} | {Token.string=="_"})
-  )?
-
+( {Token.string == "#"}
+  ({Token.kind == "number"})?
+  ({Token.kind=="word"}|{Token.string == "_"})
+  ({Token.kind=="word"}|{Token.kind=="number"}|{Token.string == "_"})[0,10]
 ): match
 -->
 :match {

This was sent by the SourceForge.net collaborative development platform, the 
world's largest Open Source development site.


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs

Reply via email to