GitHub user chenlica created a discussion: Queries and Datasets (from old wiki)
>From the page https://github.com/apache/texera/wiki/Queries-and-Datasets (may >be dangling) ==== # Datasets ## 1. A snippet of the Twitter dataset. Each tweet is stored in Json format. To friendly visualize Json format, we suggest some online Json viewer, such as [JsonViewer](http://jsonviewer.stack.hu/). > <pre><code>{"create_at": "2017-03-26T16:39:13.000Z", "id": > 846144537676431360, "text": "@carrieunderwood @opry hi carrie", > "in_reply_to_status": 845836315648315392, "in_reply_to_user": 386244525, > "favorite_count": 0, "retweet_count": 0, "lang": "en", "is_retweet": false, > "user_mentions": [386244525, 19772559], "user": {"id": 4217866818, "name": > "Lisa", "screen_name": "296_3676", "profile_image_url": > "http://pbs.twimg.com/profile_images/664966945435992065/Tw4npe2S_normal.jpg", > "lang": "en", "location": "null", "create_at": "2015-11-12", "description": > "null", "followers_count": 31, "friends_count": 67, "statues_count": 182}, > "place": {"country": "United States", "country_code": "United States", > "full_name": "Beaver Dam, WI", "id": "1389f2635209d576", "name": "Beaver > Dam", "place_type": "city", "bounding_box": [[-88.870587, 43.431528], > [-88.786438, 43.508406]]}, "geo_tag": {"stateID": 55, "stateName": > "Wisconsin", "countyID": 55027, "countyName": "Dodge", "cityID": 5505900, > "cityName ": "Beaver Dam"}}</code></pre> > <pre><code>{"create_at": "2017-08-09T11:22:09.000Z", "id": > 895349496749596672, "text": "Join the Noodles & Co. team! See our latest > #job opening here: https://t.co/NljQxBaLc0 #Veterans #MilSpouse #Greenville, > NC #Hiring", "in_reply_to_status": -1, "in_reply_to_user": -1, > "favorite_count": 0, "coordinate": [-77.3818152, 35.5794052], > "retweet_count": 0, "lang": "en", "is_retweet": false, "hashtags": ["job", > "Veterans", "MilSpouse", "Greenville", "Hiring"], "user": {"id": 88254516, > "name": "TMJ-NC HRTA Jobs", "screen_name": "tmj_nc_hrta", > "profile_image_url": > "http://pbs.twimg.com/profile_images/667871532920639488/VroXHje4_normal.jpg", > "lang": "en", "location": "North Carolina", "create_at": "2009-11-07", > "description": "Follow this account for geo-targeted > Hospitality/Restaurant/Tourism job tweets in North Carolina. Need help? Tweet > us at @CareerArc!", "followers_count": 399, "friends_count": 275, > "statues_count": 474}, "place": {"country": "United States", "country_code": > "Unite d States", "full_name": "North Carolina, USA", "id": "3b98b02fba3f9753", "name": "North Carolina", "place_type": "admin", "bounding_box": [[-84.321948, 33.752879], [-75.40012, 36.588118]]}, "geo_tag": {"stateID": 37, "stateName": "North Carolina", "countyID": 37147, "countyName": "Pitt", "cityID": 3728080, "cityName": "Greenville"}}</code></pre> > <pre><code>{"create_at": "2017-06-12T10:05:40.000Z", "id": > 874311754897137666, "text": "I told Tommy he had an obsession to something > and he goes \"you're my obsession\" and wow it was so cute I love him so > much\ufffd\ufffd", "in_reply_to_status": -1, "in_reply_to_user": -1, > "favorite_count": 0, "retweet_count": 0, "lang": "en", "is_retweet": false, > "user": {"id": 398387501, "name": "Amber Hargis", "screen_name": > "AmberHargis", "profile_image_url": > "http://pbs.twimg.com/profile_images/873390596974669824/2P3J0Hiw_normal.jpg", > "lang": "en", "location": "Columbus, OH", "create_at": "2011-10-25", > "description": "@tommymalone2\u2764\ufe0f", "followers_count": 825, > "friends_count": 912, "statues_count": 9757}, "place": {"country": "United > States", "country_code": "United States", "full_name": "Gahanna, OH", "id": > "c97807ac2cd60207", "name": "Gahanna", "place_type": "city", "bounding_box": > [[-82.905845, 39.987076], [-82.802554, 40.05651]]}, "geo_tag": {"stateID": > 39, "stateName": "Oh io", "countyID": 39049, "countyName": "Franklin", "cityID": 3929106, "cityName": "Gahanna"}}</code></pre> ## 2. A snippet of the COCO dataset. > <pre><code>{"id": 10000, "text": > "train2014/COCO_train2014_000000105363.jpg"}</code></pre> > <pre><code>{"id": 10001, "text": > "val2014/COCO_val2014_000000402233.jpg"}</code></pre> > <pre><code>{"id": 10002, "text": > "val2014/COCO_val2014_000000559252.jpg"}</code></pre> || :-------------------------:|:-------------------------:|:-------------------------: <!--|| :-------------------------:|:-------------------------:|:-------------------------:--> ## 3. A snippet of the UCF101 dataset. > <pre><code>{"id": 5000, "text": "Haircut/v_Haircut_g20_c02.avi"}</code></pre> > <pre><code>{"id": 5001, "text": > "ApplyLipstick/v_ApplyLipstick_g19_c02.avi"}</code></pre> > <pre><code>{"id": 5002, "text": > "HandstandWalking/v_HandstandWalking_g02_c01.avi"}</code></pre> # Queries ## 1. Ten queries on the Twitter dataset. ### To study the behavior of CORE with different numbers of predicates, we randomly select five queries with two strong correlated predicates and five queries with three strong correlated predicates. Id|Queries :-------------------------:|:------------------------- q<sub>1</sub>| SentimentStanfordNLP ('negative', 'neutral')→ POSTaggerSpacyLG ('VBD', 'WRB', 'IN') q<sub>2</sub>| SentimentStanfordNLP ('negative', 'neutral')→ POSTaggerSpacyLG ('PRP') q<sub>3</sub>| SentimentStanfordNLP ('neutral', 'positive')→ POSTaggerSpacyLG ('NNPS', 'VB', 'VBZ', 'WRB') q<sub>4</sub>| SentimentStanfordNLP ('neutral', 'positive')→ POSTaggerSpacySM ('VBD', 'WRB', 'PRP') q<sub>5</sub>| SentimentStanfordNLP ('neutral', 'positive')→ POSTaggerSpacySM ('PRP') q<sub>6</sub>| SentimentStanfordNLP ('neutral', 'positive')→ POSTaggerStanfordNLP ('NNPS', 'VBP', 'WRB', '.')→ POSTaggerSpacyLG ('NNPS', 'VBD', 'VBN', 'WRB', 'DT') q<sub>7</sub>| SentimentStanfordNLP ('positive')→ POSTaggerStanfordNLP ('NNPS', 'VB', 'VBD', 'VBN')→ POSTaggerSpacyLG ('NNPS', 'VB', 'VBZ', 'WRB') q<sub>8</sub>| SentimentStanfordNLP ('neutral', 'positive')→ POSTaggerStanfordNLP ('NNPS', 'VB', 'VBD', 'VBN')→ POSTaggerSpacyLG ('NNPS', 'VBD', 'VBN', 'WRB', 'DT') q<sub>9</sub>| SentimentStanfordNLP ('neutral', 'positive')→ POSTaggerStanfordNLP ('NNPS', 'VB', 'VBD', 'VBN')→ POSTaggerSpacyLG ('NNPS', 'VB', 'VBZ', 'WRB') q<sub>10</sub>| SentimentStanfordNLP ('neutral')→ POSTaggerStanfordNLP ('VBP', 'VBZ', 'WRB')→ POSTaggerSpacyLG ('NNPS', 'VBD', 'VBN', 'WRB', 'DT') ## 2. Ten queries on the COCO dataset. ### To study the behavior of CORE with different orders of predicates, we randomly select four pairs of queries. Each pair of queries contains two queries with different orders, such as q<sub>2</sub> and q<sub>3</sub>. Id|Queries :-------------------------:|:------------------------- q<sub>1</sub>| ObjectDetection ('car', 'chair', 'dining table', 'bench', 'bed', 'bird', 'vase')→ ObjectDetection ('person') q<sub>2</sub>| ObjectDetection ('person')→ ObjectDetection ('car', 'chair', 'cup', 'dog', 'handbag', 'sink', 'pizza') q<sub>3</sub>| ObjectDetection ('car', 'chair', 'cup', 'dog', 'handbag', 'sink', 'pizza')→ ObjectDetection ('person') q<sub>4</sub>| ObjectDetection ('person')→ ObjectDetection ('car', 'chair', 'cup', 'bottle', 'bed', 'cell phone', 'motorcycle') q<sub>5</sub>| ObjectDetection ('car', 'chair', 'cup', 'bottle', 'bed', 'cell phone', 'motorcycle')→ ObjectDetection ('person') q<sub>6</sub>| ObjectDetection ('person')→ ObjectDetection ('car', 'chair', 'cup', 'tv', 'bed', 'bench', 'sink') q<sub>7</sub>| ObjectDetection ('car', 'chair', 'cup', 'tv', 'bed', 'bench', 'sink')→ ObjectDetection ('person') q<sub>8</sub>| ObjectDetection ('person')→ ObjectDetection ('car', 'chair', 'bottle', 'bowl', 'handbag', 'book', 'bird') q<sub>9</sub>| ObjectDetection ('car', 'chair', 'bottle', 'bowl', 'handbag', 'book', 'bird')→ ObjectDetection ('person') q<sub>10</sub>| ObjectDetection ('person')→ ObjectDetection ('car', 'chair', 'dining table', 'book', 'surfboard', 'bird', 'vase') ## 3. Ten queries on the UCF101 dataset. ### For the UCF101dataset, we randomly select ten queries with strong correlations. Id|Queries :-------------------------:|:------------------------- q<sub>1</sub>| ActivityRecognition ('Archery', 'BalanceBeam', 'Basketball', 'BandMarching', 'BasketballDunk', 'Biking', 'BreastStroke', 'BenchPress', 'BoxingPunchingBag', 'BlowDryHair', 'Bowling', 'BabyCrawling', 'ApplyLipstick')→ ObjectDetection ('chair', 'sports ball', 'dog', 'car', 'tv', 'horse', 'bicycle', 'skateboard', 'tennis racket', 'surfboard', 'bird') q<sub>2</sub>| ActivityRecognition ('Archery, 'BalanceBeam', 'Basketball', 'BandMarching', 'Biking', 'BreastStroke', 'BrushingTeeth', 'BaseballPitch', 'BoxingPunchingBag', 'BoxingSpeedBag', 'Bowling', 'ApplyLipstick')→ ObjectDetection ('chair', 'sports ball', 'dog', 'car', 'tv', 'horse', 'bicycle', 'skateboard', 'tennis racket', 'surfboard', 'bird') q<sub>3</sub>| ActivityRecognition ('Archery', 'BalanceBeam', 'Basketball', 'BandMarching', 'Biking', 'BodyWeightSquats', 'BreastStroke', 'BrushingTeeth', 'BaseballPitch', 'Bowling', 'BabyCrawling')→ ObjectDetection ('chair', 'sports ball', 'dog', 'car', 'tv', 'horse', 'bicycle', 'skateboard', 'tennis racket', 'surfboard', 'bird') q<sub>4</sub>| ActivityRecognition ('Archery', 'BalanceBeam', 'Basketball', 'BasketballDunk', 'BlowingCandles', 'Biking', 'BreastStroke', 'BrushingTeeth', 'BlowDryHair', 'BoxingSpeedBag')→ ObjectDetection ('chair', 'sports ball', 'dog', 'car', 'tv', 'horse', 'bicycle', 'skateboard', 'tennis racket', 'surfboard', 'bird') q<sub>5</sub>| ActivityRecognition ('Archery', 'BalanceBeam', 'Basketball', 'BlowingCandles', 'BreastStroke', 'BrushingTeeth', 'BaseballPitch', 'BenchPress', 'BlowDryHair', 'BoxingSpeedBag', 'Bowling')→ ObjectDetection ('chair', 'sports ball', 'dog', 'car', 'tv', 'horse', 'bicycle', 'skateboard', 'tennis racket', 'surfboard', 'bird') q<sub>6</sub>| ActivityRecognition ('Archery', 'Basketball', 'BandMarching', 'BasketballDunk', 'Biking', 'BodyWeightSquats', 'BreastStroke', 'BoxingPunchingBag', 'BoxingSpeedBag', 'Bowling', 'ApplyLipstick')→ ObjectDetection ('chair', 'sports ball', 'dog', 'car', 'tv', 'horse', 'bicycle', 'skateboard', 'tennis racket', 'surfboard', 'bird') q<sub>7</sub>| ActivityRecognition ('Archery', 'BalanceBeam', 'BasketballDunk', 'BlowingCandles', 'BodyWeightSquats', 'BreastStroke', 'BaseballPitch', 'BoxingPunchingBag', 'BoxingSpeedBag', 'Bowling', 'BabyCrawling')→ ObjectDetection ('chair', 'sports ball', 'dog', 'car', 'tv', 'horse', 'bicycle', 'skateboard', 'tennis racket', 'surfboard', 'bird') q<sub>8</sub>| ActivityRecognition ('Archery', 'Basketball', 'BandMarching', 'BasketballDunk', 'BlowingCandles', 'Biking', 'BrushingTeeth', 'BaseballPitch', 'BenchPress', 'Bowling')→ ObjectDetection ('chair', 'sports ball', 'dog', 'car', 'tv', 'horse', 'bicycle', 'skateboard', 'tennis racket', 'surfboard', 'bird') q<sub>9</sub>| ObjectDetection('chair', 'sports ball', 'dog', 'car', 'tv', 'horse', 'bicycle', 'skateboard', 'tennis racket', 'boat', 'cup')→ ActivityRecognition ('Archery', 'BalanceBeam', 'Basketball', 'BandMarching', 'BasketballDunk', 'BlowingCandles', 'BodyWeightSquats', 'BreastStroke', 'BrushingTeeth', 'BaseballPitch', 'BoxingPunchingBag', 'BoxingSpeedBag', 'BabyCrawling', 'ApplyLipstick') q<sub>10</sub>| ActivityRecognition('Archery', 'BalanceBeam', 'Basketball', 'BandMarching', 'BasketballDunk', 'BlowingCandles', 'BodyWeightSquats', 'BreastStroke', 'BrushingTeeth', 'BaseballPitch', 'BoxingPunchingBag', 'BoxingSpeedBag', 'BabyCrawling', 'ApplyLipstick')→ ObjectDetection ('chair', 'sports ball', 'dog', 'car', 'tv', 'horse', 'bicycle', 'skateboard', 'tennis racket', 'boat', 'cup') GitHub link: https://github.com/apache/texera/discussions/3980 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
