GitHub user chenlica closed a discussion: SystemT Rewriter (from old wiki)

>From the old page https://github.com/apache/texera/wiki/SystemT-Rewriter (may 
>be dangling)

====

**Authors:** Qing Tang (qingt AT uci dot edu), Jinggang Diao (diaojinggang AT 
gmail dot edu), Flavio Bayer (flaviorbayer AT gmail dot edu).

Status: As of July 2, 2016, this task was not completed due to the complexity 
of the SystemT language and our limited amount of time. We will do a separate 
task to translate a SystemT query to a Texera plan.

**Progress:**


5/2: We selected some regex test cases and ran those tests with SystemT 
implementation. This is the result: 
https://drive.google.com/file/d/0B1FdPBs0KkvxYVAwVDF2dDRFdGM/view?usp=sharing. 
(p.s. Zuozhi ran the same tests on his laptop with Lucene implementation)

The following is a rough grammar that we have now:
https://drive.google.com/file/d/0B1FdPBs0KkvxdU9jVXRqSnVJdFk/view?usp=sharing

We will try to add some more details to the grammar that we got, and parse 
based on this grammar.


============================================================================================================================

4/17: We have completed a simple parser model, codes will be uploaded before 
Monday lecture. I am trying to figure out a way to manage the relationship 
between each view at this time. Besides, I am trying to get familiar with 
OperatorGraph. Now we have successfully installed JavaCC, and we think it might 
be useful to generate the parse part when we design the grammar.

_**test case**: 
https://drive.google.com/a/uci.edu/file/d/0B1FdPBs0KkvxY2ZCclpILXBUbHM/view?usp=sharing_

_Parse result:_

_Dict List: []_

_Regex List: 
[(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:orber)?|Nov(?:ember)?|Dec(?:ember)?)
 (?:19[7-9]\d|2\d{3}), 
(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:orber)?|Nov(?:ember)?|Dec(?:ember)?)\s(\d|[0-2]\d|3[0-1]),\s(19\d{2}|2\d{3}),
 (\d|0\d|1[0-2])\/(\d|[0-2]\d|3[0-1])\/(19\d{2}|2\d{3}|d{2})]_

_Name List: [DateFormat1, DateFormat2, DateFormat3, DateUnion]_

_Union List: [DateUnion, (, DateUnion, (]     <= This part is under 
consideration._

============================================================================================================================
 
4/11: 
https://docs.google.com/presentation/d/1RAxF3ZyBCPOwrOvOqM5iQhJCnjw2iLVKg1qiyviT_UA/edit#slide=id.p

============================================================================================================================
[SystemT](http://researcher.watson.ibm.com/researcher/view_group.php?id=1264) 
is a software package developed by IBM to support powerful information 
extraction.

The purpose of this task is to write a parser for the SystemT language so that 
we can translate a SystemT query to a query that can be answered efficiently by 
our Texera system that utilize its available indexing and query-processing 
capabilities.

**Resources:** 

* SystemT is available externally as BigInsights Text Analytics. You can get a 
copy of BigInsights from the following link: 
[http://www-01.ibm.com/support/docview.wss?uid=swg24040517](http://www-01.ibm.com/support/docview.wss?uid=swg24040517).

* The full specification of AQL can be found at 
[http://www.ibm.com/support/knowledgecenter/SSPT3X_4.0.0/com.ibm.swg.im.infosphere.biginsights.aqlref.doc/doc/aql-overview.html](http://www.ibm.com/support/knowledgecenter/SSPT3X_4.0.0/com.ibm.swg.im.infosphere.biginsights.aqlref.doc/doc/aql-overview.html).

* For students who are interested in building extractors via UI, via BlueMix as 
instructed here: 
http://researcher.watson.ibm.com/researcher/view_group_subpage.php?id=6335

* SystemT is a proprietary product of IBM.  With the kind support from our IBM 
colleagues, we can access it for education purposes. If you want to access the 
package, please contact the instructor (Prof. Chen Li).

GitHub link: https://github.com/apache/texera/discussions/3986

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]

Reply via email to