Revision: 17247
http://sourceforge.net/p/gate/code/17247
Author: markagreenwood
Date: 2014-01-24 17:07:59 +0000 (Fri, 24 Jan 2014)
Log Message:
-----------
updated the jape for finding hashtags so that it is closer to how twitter
determines what a hashtag is
Modified Paths:
--------------
gate/trunk/plugins/Twitter/resources/tokeniser/twitter.jape
Modified: gate/trunk/plugins/Twitter/resources/tokeniser/twitter.jape
===================================================================
--- gate/trunk/plugins/Twitter/resources/tokeniser/twitter.jape 2014-01-24
16:43:33 UTC (rev 17246)
+++ gate/trunk/plugins/Twitter/resources/tokeniser/twitter.jape 2014-01-24
17:07:59 UTC (rev 17247)
@@ -17,19 +17,20 @@
// flag RT/MT/via @/by @ & via specially? (probably better to handle later)
-// Syntax of username & hashtag according to
+// Syntax of usernames
// http://sentiment.christopherpotts.net/tokenizing.html#twitter
// @+[\w_]+
-// \#+[\w_]+[\w_\-]*[\w_]+
+// and hashtags according to
+//
https://support.twitter.com/articles/370610-my-hashtags-or-replies-aren-t-working
+// can include numbers but can't be just a number
+// punctuation breaks the tag; apart from underscores appear to be allowed
+
Rule: Hashtag
-( ({Token.string == "#"})
- ({Token.kind=="word"} | {Token.string=="_"}| {Token.string=="-"})
- (
- ({Token.kind=="word"} | {Token.kind=="number"} | {Token.string=="_"}|
{Token.string=="-"})[0,5]
- ({Token.kind=="word"} | {Token.kind=="number"} | {Token.string=="_"})
- )?
-
+( {Token.string == "#"}
+ ({Token.kind == "number"})?
+ ({Token.kind=="word"}|{Token.string == "_"})
+ ({Token.kind=="word"}|{Token.kind=="number"}|{Token.string == "_"})[0,10]
): match
-->
:match {
This was sent by the SourceForge.net collaborative development platform, the
world's largest Open Source development site.
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs