[tor-dev] CAPTCHA Monitoring Project Updates & Findings

2020-07-09 Thread Barkin Simsek
Hi everyone,

I made progress on the Cloudflare CAPTCHA Monitoring project since my last
email, and I wanted to share some of the updates & findings. This year Tor
Project is participating in GSoC under the DIAL umbrella, and I have
already been posting updates to the DIAL blog [1] weekly. I started
mirroring these updates [2] to my project's wiki page, and I will be
posting more frequent updates here.

a) Updates:
Firstly, I moved the wiki page that explains the project, the code base,
and the issue tracker to Tor Project's GitLab. They are all in the same
GitLab project. You can find detailed information about the project on the
wiki page [3] and leave comments & suggestions within that repository by
creating issues.

Secondly, I got a fully functioning system up and running. The system
fetches various URLs with Tor Browser & Firefox over Tor and checks for
CAPTCHAs. The system also checks if any third-party code was injected by
comparing the hash of the received page with an expected hash value. It
repeats these experiments using different exit relays and records results.
You can view the results on the dashboard [4] I created. I'm looking for
more URLs to track for CAPTCHAs. Feel free to share the websites you
frequently visit and get CAPTCHAs, so that I can track these websites with
this tool as well. I want to experiment with all types of CAPTCHAs, and
these URLs don't have to be fronted by Cloudflare.

b) Findings:
So far, I have observed that using the Tor Browser Bundle out of the box
without changing its configurations doesn't lead to a high CAPTCHA rate on
Cloudflare fronted websites (assuming the website owners don't explicitly
block exit relays [5]). That said, modifying the user-agent or any other
modifications that deviate your browser's fingerprint from a typical Tor
Browser user, significantly increases the chance of getting CAPTCHAs. For
example, using the regular Firefox over Tor resulted in getting CAPTCHAs in
~90% of the measurements. I believe Cloudflare is very aggressive against
the "Firefox over Tor" users because many people, unfortunately, use
Chromium/Firefox + Selenium + Tor to scrape web pages and bypass IP-based
rate limits. That's why I'm interested in hearing about your specific
browser/Tor configurations to test them with the CAPTCHA Monitor. Not
everyone is affected in the same way because of these differences in the
way we use Tor, but we can understand which differences affect the CAPTCHA
rate more than others by experimenting.

Additionally, I observed that the TLS fingerprint has a significant role in
whether someone gets a CAPTCHA or not. As a part of the project, I decided
to capture the HTTP headers during measurements to understand how they
affect the CAPTCHA rates. Initially, I was using a Python library called
seleniumwire to capture the HTTP headers by intercepting the traffic
between the Tor Browser and Tor. By doing this, I got a very high CAPTCHA
rate, like 98% of the time. seleniumwire forwards the traffic
transparently, but it has a different TLS fingerprint than Tor Browser. I
figured out that the difference in the TLS fingerprints was triggering the
MITM detection on the Cloudflare side, thus, resulting in very high CAPTCHA
rates.

Interestingly, I tried using the exact same Tor Browser & seleniumwire
setup, but without Tor and, practically, I didn't get any CAPTCHAs. I
believe the MITM detection is more aggressive if the traffic is coming
through an exit relay. So, I stopped using seleniumwire to capture headers
because it didn't reflect what a real human Tor Browser user is usually
experiencing. Please feel free to use the sample code [6] that I used to
combine seleniumwire and Tor, if you are interested in doing further
experimenting on this.

c) Next:
I will work on collecting more metrics by testing more configurations and
websites. I will create a "Relay Search" section on the dashboard, where
CAPTCHA statistics for the relays (exit relays for now) will be available.
I will also work on using the collected data to predict the probability of
getting CAPTCHAs with a given exit relay and configuration/setup.

Best,
Barkin

[1]
https://hub.osc.dial.community/t/tor-project-cloudflare-captcha-monitoring/1558
[2] https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/wikis/Updates
[3] https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/wikis/home
[4] https://dashboard.captcha.wtf/
[5] Cloudflare has a setting to block all traffic originating from the Tor
network, but that setting is not "turned on" by default
[6] https://gist.github.com/woswos/38b921f0b82de009c12c6494db3f50c5
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


[tor-dev] Gitolite to Gitlab Sync Change

2020-07-09 Thread Alexander Færøy
Hello folks!

In an attempt to solve Gitlab#41[1] the hooks that is executed when you
push to git.torproject.org to synchronize to Gitlab was modified to
avoid pruning references in Gitlab that was missing in the Gitolite
repository.

This /should/ have the following implications:

- We no longer delete the "refs/merge-requests/*" namespace each time
  someone pushes to your repository on git.torproject.org. This should
  allow people to use the `git mr` alias that can be found online which
  should make local code-reviews easier and also allow you to handle
  manual CLI merging more easily.

- We no longer delete branches automatically on Gitlab: if a branch is
  deleted on git.torproject.org. This have to be deleted manually now on
  Gitlab.

Please report any issues you that you might discover on the ticket.

Thanks to Hiro for getting this running!

All the best,
Alex.

[1]: https://gitlab.torproject.org/tpo/tpa/gitlab/-/issues/41

-- 
Alexander Færøy
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Gitlab CI runners available for experimentation on gitlab.torproject.org

2020-07-09 Thread Hans-Christoph Steiner

Happy to help!  I'm a big fan of gitlab-ci since it is a collection of
standard tools like Docker, YAML, bash, etc. It takes a bit more to
learn than Travis-CI, but it pays off by being more flexible and a
simpler setup.  E.g. it is easy to start with a plain, base Debian
image, only install the requirements, and run tests from there.  No
intermediate layer. And using YAML templates, it is possible to reuse
chunks of code. In the go setup I'm using right now for snowflake, I
have a template for the install and test run.  Then its trivial to
add/change the base image between Debian-derivs of various releases.

I should also add: these are actually F-Droid runners, not Guardian
Project.  F-Droid a bare metal server (16-core/32-thread, 142GB RAM, 3TB
disk) that could be allocated to only gitlab runners that can be shared
with Tor Project cost-free.  We just need someone to admin it.  We have
an almost complete setup with ansible.  @uniqx and I would happily help
someone finish that setup if there was someone to make sure it stays
updated and running.  (I'm personally already admining more servers than
I should be).

Also, these runners have KVM and privileged mode enabled, so you can run
any KVM VM in the gitlab-ci jobs.  Docker too.

.hc

Alexander Færøy:
> Hello folks!
> 
> Hans from The Guardian Project added his CI runners to our Gitlab
> instance. It looks like some pretty fast machines that allows each team
> to experiment with Gitlab CI on our Gitlab instance.
> 
> Hans says that the runners have no uptime promises or anything like
> that, so if they are down they are down :-)
> 
> Here's some documentation for getting started:
> https://docs.gitlab.com/ee/ci/
> 
> Thanks to Hans for this!
> 
> All the best,
> Alex.
> 

-- 
PGP fingerprint: EE66 20C7 136B 0D2C 456C  0A4D E9E2 8DEA 00AA 5556
https://pgp.mit.edu/pks/lookup?op=vindex=0xE9E28DEA00AA5556
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev