Re: [Mediawiki-api] About Statistical Data from Query of Multiple Duplicates

Marco Antonio Fri, 07 Feb 2020 12:47:00 -0800

I see.
Thanks for you answer and suggestions.

I think that it could be beneficial (in terms of structure in the data set)
and more reasonable to create a specific count for content besides the
current article titles. That could be useful, for example, to outsiders as
us to measure how people are reaching Wikipedia articles, either by google
search or by using old links, with a margin of error of course.


Anyway, thanks again!




Marco Antonio

Graduando em Matemática Pura na USP | Divulgador Científico

<https://www.facebook.com/ViaSaber> <https://www.linkedin.com/in/magcastro/>
<https://www.instagram.com/marcoantoniograziano/>




On Fri, Feb 7, 2020 at 4:43 AM Brian Wolff <bawo...@gmail.com> wrote:

> 2. No
> 4. You would have to figure out all the redirects and sum them. The api
> allows you to fetch the list of redirects. Another option is the redirect
> table available from the database dumps at download.wikimedia.org
>
>
>
> https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageviews/Redirects
> might be helpful (its a bit old, i assume its still accurate)
> --
> Bawolff
>
> On Thursday, February 6, 2020, Marco Antonio <mgrazianodecas...@gmail.com>
> wrote:
>
>> This kind of answers me the first question and the third one, but the
>> second and the fourth are still an issue for me.
>>
>>
>>
>>
>> Marco Antonio
>>
>> Graduando em Matemática Pura na USP | Divulgador Científico
>>
>> <https://www.facebook.com/ViaSaber>
>> <https://www.linkedin.com/in/magcastro/>
>> <https://www.instagram.com/marcoantoniograziano/>
>>
>>
>>
>>
>> On Fri, Feb 7, 2020 at 12:04 AM Brian Wolff <bawo...@gmail.com> wrote:
>>
>>> There are a variety of reasons why someone might view a redirected title:
>>> * following a link still using the old title. (Either internally or
>>> externally)
>>> * typing the old name exactly in the search bar
>>> * typing old name in address bar
>>>
>>> --
>>> Brian
>>>
>>> On Thursday, February 6, 2020, Marco Antonio <
>>> mgrazianodecas...@gmail.com> wrote:
>>>
>>>> Hi folks.
>>>>
>>>> *This is my first time using this mail list*, so if this is not the
>>>> right place to ask this kind of question please lemme know about how I
>>>> should proceed in this case.
>>>>
>>>>
>>>> *Question*
>>>> I have basically downloaded from MediaWiki API a lot of pages related
>>>> to mathematics. Some of them are just *duplicated of the same Article*,
>>>> but with one difference being their title, such as different way os calling
>>>> the same subject, or letter that differs from one and another, ao so on and
>>>> so forth.
>>>>
>>>> One example that I can show you right away is:
>>>>
>>>>    - "Adição_de_*s*egmentos", and
>>>>    - "Adição_de_*S*egmentos",
>>>>
>>>> both written in portuguese (my native language). The only difference
>>>> between the titles are just the lowercase and uppercase of the letter
>>>> "s".As I was testing on the URL's, it seems that *they both are the
>>>> same article, but redirecting from different links to the official 
>>>> "title".*
>>>>
>>>> Keeping in mind those kind of duplicates, when I've started *to
>>>> analyse the statistics of views on a specific article*, while going
>>>> through its cases, I was expecting to receive the following structure of
>>>> data:
>>>>
>>>>
>>>>    - The old ones (deprecated) would hold views until some day X, and
>>>>    then it would have nothing to further count and show;
>>>>    - The up-to-date titles would have data starting from day X and
>>>>    then would hold until the last day that I want to analyse.
>>>>
>>>>
>>>> Nothing too crazy to expect from the database. But that was not what
>>>> happened. *There are plenty of articles that are still receiving views
>>>> even though they all redirect to another article*. At first, I've just
>>>> thought that people are getting to the articles's content with different
>>>> links available on search engines, such as google, so all views must be
>>>> independent from one another. The problem is, after testing on the google
>>>> platform different search for *the same Wikipedia's article I can only
>>>> get the* *up-to-date articles, not the old ones.*
>>>>
>>>>
>>>>    1. How can this be possible?
>>>>    2. But more important for me, are all acesses on the deprecated
>>>>    articles made by bots or old links available on old pages from other 
>>>> sites?
>>>>    3. Are the count on all different article's title independent?
>>>>    4. If so, how could I be able to even track all the possible
>>>>    acesses on a particular subject to create an effective study o it?
>>>>
>>>>
>>>> Anyway, this is (if I remember well) the fourth time I'm trying to get
>>>> a proper answer for my question, and I'm hopping I'll get it soon.
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> Marco Antonio
>>>>
>>>> Graduando em Matemática Pura na USP | Divulgador Científico
>>>>
>>>> <https://www.facebook.com/ViaSaber>
>>>> <https://www.linkedin.com/in/magcastro/>
>>>> <https://www.instagram.com/marcoantoniograziano/>
>>>>
>>>>
>>>> _______________________________________________
>>> Mediawiki-api mailing list
>>> Mediawiki-api@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>>>
>> _______________________________________________
> Mediawiki-api mailing list
> Mediawiki-api@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>

_______________________________________________
Mediawiki-api mailing list
Mediawiki-api@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

Re: [Mediawiki-api] About Statistical Data from Query of Multiple Duplicates

Reply via email to