[tor-dev] Issue regarding IRC channel

2017-02-12 Thread Jaskaran Singh
Hi,

As of writing this mail, I can't access the Tor-Dev channel on OFTC
neither using my registered nick on desktop client nor through the web
interface. Looks like the channel has turned invite only. Would be great
if someone could look into this.

Regards,
Jaskaran Veer Singh
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


[tor-dev] Proposing "Post-Quantum safe handshake implementation" as GSoc Project

2017-02-18 Thread Jaskaran Singh
Hi,

My name is Jaskaran, and I'm an Electronics Engineering undergrad in
India. I'm a privacy, anonymity and FOSS supporter. I've worked in the
past with the Libreoffice project during the Google Summer of Code 2016.
This year I'm interested to work with the TOR project as a part of GSoC.

I'm particularly interested to work on making TOR Handshakes
Post-Quantum Safe. I feel that this should be implemented at the
earliest because adversaries could store the network traffic and decrypt
it later on using Quantum Computers when they're invented.

So here's what I think the task would comprise of :

1. Add a new CREATE2V and CREATED2V cell that can support 2240 bytes of
HDATA (according to NewHope-Simple algorithm[1]). Add facility for
multiple EXTEND2/EXTENDED2 cells to be sent when the handshake data
doesn't fit into a single cell.

2. Implement the NewHope-Simple algorithm[1] because we'll not be able
to use the Vanilla NewHope as it is protected by some patents. I wasn't
able to find any implementation of NewHope Simple. So can the Vanilla
NewHope Implementation be tweaked to convert it into NewHope Simple? Or
would we have to write it from ground up? I don't know about the patent
laws regarding it.

3. Finally, generate test vectors and check for any bottlenecks. Improve
efficiency and check for any vulnerabilities in the implementation that
could be exploited by the adversary.

I'd like to know your views on this. Suggestions, comments, criticism
are all welcome.

References

[1] https://eprint.iacr.org/2016/1157.pdf

PS: There's Something I noticed while reading the proposal. The portions
don't add up to the size of the cell. Here's aSigned fix for it.

commit a55692fcd93e3f064f1fffe24796dc747e4870e1
Author: Jaskaran Singh 
Date:   Sat Feb 18 13:32:32 2017 +0530

Fix HDATA sizes in proposal 270

diff --git a/proposals/270-newhope-hybrid-handshake.txt
b/proposals/270-newhope-hybrid-handshake.txt
index ccf3390..c0f36ae 100644
--- a/proposals/270-newhope-hybrid-handshake.txt
+++ b/proposals/270-newhope-hybrid-handshake.txt
@@ -432,7 +432,7 @@ Depends: prop#220 prop#249 prop#264 prop#270
   HTYPE   := 0x0003 [2 bytes]
   HLEN:= 0x0780 [2 bytes]
   HDATA   := CLIENT_HDATA   [1920 bytes]
-  IGNORED := 0x00   [194 bytes]
+  IGNORED := 0x00   [190 bytes]
 }

   [XXX do we really want to pad with IGNORED to make CLIENT_HDATA the
@@ -485,7 +485,7 @@ Depends: prop#220 prop#249 prop#264 prop#270
   NSPEC := 0x00   [1 byte]
   HTYPE := 0x [2 bytes]
   HLEN  := 0x [2 bytes]
-  HDATA := 0x00[172]  [172 bytes]
+  HDATA := 0x00[172]  [174 bytes]
 }

   The client sends this to the server to extend the circuit from, and that
@@ -525,7 +525,7 @@ Depends: prop#220 prop#249 prop#264 prop#270
   NSPEC := 0x00   [1 byte]
   HTYPE := 0x [2 bytes]
   HLEN  := 0x [2 bytes]
-  HDATA := SERVER_HDATA[1940,2112][172 bytes]
+  HDATA := SERVER_HDATA[1940,2112][174 bytes]
 }

Regards,
Jaskaran


___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Tor in Google Summer of Code 2017

2017-03-07 Thread Jaskaran Singh
Hi Damian,

On Tuesday 07 March 2017 11:54 PM, Damian Johnson wrote:

> Finally, write down your project idea using our template [5] and submit
> your application to Google before March 25th [6].

I think the deadline is April 3 this year.

Regards,
Jaskaran
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


[tor-dev] Proposal xyz : Count Unique IP addresses in an anonymous way

2017-03-17 Thread Jaskaran Singh
Hi,

Please have a look at this proposal. Will replace xyz with more
meaningful numbers once this is finalized. Comments, suggestions and
criticism are welcome.

-
Filename: xxx-Count-unique-IPs-in-anonymous-way.txt
Title: Count Unique IP addresses in an anonymous way
Author: Jaskaran Veer Singh
Created: 14 March 2017
Status: Draft

§0. Introduction

Currently, guard relays and bridges maintains a list of IP addresses of
the devices that connect to it for various reasons such as for use by
the bridge to check which country has them blocked. This is dangerous
because if any of these tor instances get compromised, clients will be
de-anonymized. To solve this issue, a new data structure that keeps a
track of unique IP addresses seen but does not directly keep a list of
them is proposed in this document.

§1. Specification

§1.1. Notation

  Let `a^b` denote the exponentiation of a to the bth power.

  Let `a == b` denote the equality of a with b, and vice versa.

  Let `a := b` be the assignment of the value of b to the variable a.

  Let `a / b` denote the division of a by b.

  Let `a <= b` denote that a is less than equal to b.

  Let `a >= b` denote that a is greater than equal to b.

§2. Research

There are three ways to solve this problem. All the three ways are
actually Big Data
Algorithms. A few problems arises for each of these algorithms since
they are made for
big data but the data we would provide is not necessarily big.

§2.1. Bloom Filter[1]

A bloom filter maps the input to positions on the bitmap after passing
through 2 or more hash functions. Later any new input are mapped onto
this bitmap in the same way to check whether this value is already
present in the set. The feature of this bitmap is that collisions could
happen. And this collision creates deniability. When collisions happen,
On the one hand, one of the input doesn’t count to be unique (although
in reality, it is), and on the other hand, this is beneficial since this
creates deniability. The person who gets hand on this data structure
could never be 100% sure about the original inputs. So we get the job
done successfully at some error rate.

§3.1.1. Obstacle

Suppose if the number of inputs is small. Let’s say we receive just 1
connection in a day from some small, less busy country like Estonia. In
that case, there might not be any chance for collision and the adversary
could determine the IP address with some brute force. Hence this
algorithm isn’t suited for us.

§3.2. Rappor[2]

Randomized Aggregatable Privacy Preserving Ordinal Responses is an
algorithm where the system adds some deterministic and nondeterministic
noise to the data that has to be stored. This creates deniability. In
our case, we don’t need to have deterministic noise added at first
stage. So we’ll just stick to adding non deterministic noise and storing
it in a bloom filter.

One thing to note here is that, we should not accept output of the non
deterministic randomizer which can be traced back to any IP address of
Group D or Group E since those IP addresses are not in use and the
adversary could easily know that those have been produced after adding
random noise.

§3.2.1. Obstacle

Using brute force technique, the adversary could check to see whether
the IP address stored is the correct one or produced using random noise.
In this technique, the adversary could compare those IP address
(obtained via brute force) to the directory of what IP addresses are
allotted to what country. The one that does not match, is the one
that has been faked by using random noise.

§3.3. Probabilistic Counting with Stochastic Averaging[3] (PCSA)

It is based on FM sketches. The algorithm goes as follows:

|  m = 2^b # with b in [4...16]
|  bitmaps = [[0]*32]*m # initialize m 32bit wide bitmaps to 0s
|
|  # Construct the PCSA bitmaps
|  for h in hashed(data):
|  bitmap_index = 1 + get_bitmap_index( h,b ) # binary address of
rightmost b bits
|  run_length = run_of_zeros( h,b )  # length of the run of zeros
starting at bit b+1
|  bitmaps[bitmap_index][run_length] = 1 # set the bitmap bit
   based on the run length observed
|
|  # Determine the cardinality
|  phi = 0.77351
|  DV = m / phi * 2 ^ (sum( least_sig_bit( bitmap ) ) / m) # the DV
 estimate

So, Error is bounded by 0.78/sqrt(m)

§3.3.1. Obstacle

This algorithms stated above is made for use on large databases. Infact,
these were invented to save time and space while doing basic set
operations on data with high cardinality. But the data we would provide
as an input is not necessarily of high cardinality. Since we would be
counting numbers for each country separately, so the expected value of
the input cardinality would be :

0 <= C <= 2500

where C is the actual cardinality of t

Re: [tor-dev] Proposal xyz : Count Unique IP addresses in an anonymous way

2017-03-21 Thread Jaskaran Singh

Hi Andreas,

On Saturday 18 March 2017 10:06 AM, Andreas Krey wrote:
> As an adversary, I wouldn't take down the bridge but either monitor
> the traffic to it ($country can also do this on its border gateways),
> or modify it to tell me the connecting IP addresses.

Absolutely correct. In fact I mentioned this toward the end of the proposal.

>> Or even better, why would the adversary need that random value
>> when she can simply log all network connections coming into the
>> (compromised) system?

But I think that our users don't expect us to profile them. So, I
believe it is bad practice to keep a list of something as important as
IP addresses for any anonymity system.

But there's a thing to be happy about. If we replace the list of IP
addresses with something better, we might be able to prevent adversary
from getting the knowledge of IP addresses that connected in the past.
Suppose an adversary(lets say the government) suspects someone used a
bridge to connect to Tor Network. They would only be able to know that
for sure if they log that activity there and then. Means, they would not
be able to de-anonymize users retrospectively. Of course, if the user
again connects from the same IP and the adversary is monitoring the
relay/bridge, that might cause problem.

> End users tend to be on dynamic IP address, so stored IP addresses
> aren't of much worth when you don't know when they were used; that
> is a reason why $adversary might be more interested in snooping
> than in compromising the bridge.
> 
> (Although I don't know how prevalent changing IP addresses still
> are when you're online permanently. E.g. here in germany telekom
> changes to all-ip, and there no longer disconnects after 24h, and
> thus you don't change IPs every day.)
> 
> ...
>> present in the set. The feature of this bitmap is that collisions could
>> happen. And this collision creates deniability. When collisions happen,
> 
> The problem is that for the accounting purposes you don't want (too
> many) collisions, and also that state agencies don't necessarily
> care for plausible deniability - if an IP address is found by
> enumeration and probing the bloom filter they might still decide
> to put that user on closer watch. (I've heard that a lot of the
> traditional telephone tapping isn't used as evidence in court
> but produces leads to where to investigate next.)
> 
> On the other hand side you can indeed keep the filter rather small
> because one bridge doesn't get that many collisions, and you don't
> need to make it anywhere as big as to avoid collision with 2^32 entries.
> Could also be dynamically sized depending on the number of clients seen
> - you need aging anyway, so the next table can have a different size.
> 
I feel that this isn't a nice solution. Suppose you have 10 cells and 3
hash functions at the beginning. Later when inputs exceed a threshold,
you increase the size of bloom filter to make it to 20 cells. Now those
3 hash functions would map to the whole range which means the inputs
that were mapped to 10 cells would now map to something completely
different. Hence, error rate would be, I guess exactly 100%.

> You can also go and poison the bloom filter with some random addresses,
> even a lot, actually. If we're talking of 2000 users you can easily
> throw in another 2000 random addresses without decreasing the
> precision of the statistics much - only on a size comparable to
> collisions in the bloom filter itself.
> 
> - Andreas
> 

Regards,
Jaskaran





signature.asc
Description: OpenPGP digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Proposal xyz : Count Unique IP addresses in an anonymous way

2017-03-21 Thread Jaskaran Singh
Hi,

So here's the updated part of the proposal.



§ Threat model & Security Considerations

Consider the adversary with the following powers:

 - Has sufficient computational and storage power to brute force any
   method that can be brute forced.

 - Can get the recurrent control of the concerned guard-node/bridge.

 - Can interact with the concerned data structure that stores unique-IP-
   addresses/hash-values/bloom-filter/bitmaps etc.

 - Can also log incoming connections and IP addresses outside the realm
   of Tor(i.e at the system level or at gateways etc.)

 - Can manipulate the incoming connection with some made up IP address
   as to observe the working of our proposed solution.

 - As a consequence of previous power, adversary can also inject pattern
   of IP addresses to observe any pattern in the stored data structure.

An ideal solution would not involve hashing or even if it does, it would
manipulate that hash to before storing in such a way that adversary
cannot learn about IP addresses even with brute force attack.

An ideal solution would not help the adversary observe any pattern in
the stored data structure. This could be accomplished by incorporating
salted hash or variations of it into the proposed solution. And the salt
would be changed every time we start tracking unique IP addresses.

There is a fundamental limitation to what we can do and that is that we
cannot stop an adversary from gaining knowledge of IP addresses at the
system level or a gateways etc. But, the thing to cheer about is that
in this way, the adversary cannot learn about the users retrospectively.



Regards,
Jaskaran



signature.asc
Description: OpenPGP digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Issues With Ticket #7532 - "Count unique IPs in an anonymous way"

2017-03-28 Thread Jaskaran Singh
Hi Samir,

Brute force does affect Bloom filter/hashed-values as you rightly
mentioned, but not Probabilistic Counting by Stochastic Averaging (PCSA).

PCSA works on the principle that in an input the probability of n
consecutive bits having value '0' from the left side(could be right as
well, but for now assume it left) is 2^(-(n+1)). Bit 'i' of the
Bitmap(which is our main data structure) is set if a the number of
consecutive zeros (from left) is 'i'.

We keep repeating it for every input(IP address). We then end up with a
Bitmap whose  most significant '1' can be computed to give us an
approximate number of inputs that must have been gone into the algorithm.

In simple words, if I tell you that I have seen the value 101 out of
a total of 'x' values I examined. You could guess that I had examined a
total of 2^5 values before I saw that particular value.

We would tweak the algorithm to store only the significant most '1' in
bitmap instead of storing '1' at every iteration. This would mean that
all that adversary could get hold of is a bitmap whose just one of the
bit is '1'.

Example, the adversary might get a data structure that looks like:
0100
and would have no way tell what IP addresses were used as an input.

This was just the basic idea behind PCSA. The actual PCSA makes use of
complicated looking formula to get the approximate number of unique IP
addresses in order to keep error rate low.

I hope this makes sense.

For some more information and simulation, please check

[0] https://research.neustar.biz/2013/04/02/sketch-of-the-
day-probabilistic-counting-with-stochastic-averaging-pcsa/
[1] http://content.research.neustar.biz/blog/runs.html

Regards,
Jaskaran

On Wednesday 29 March 2017 01:54 AM, samir menon wrote:
> This ticket [1] was suggested as a GSoC project, but I think there might
> be an issue with the security model/perceived threat.
> 
> To summarize the ticket and its child [1], basically, we currently store
> all the IP's seen by a node so that we can count unique IP's. The idea
> is that this is dangerous; if a node is compromised, then all of those
> IP addresses can be retrieved from memory. Therefore, a variety of
> mitigation methods have been proposed (most prominently, the
> 'Probabilistic Counting Algorithm' from [2])
> 
> Here's my issue: what about brute force? 
> 
> No matter what method we use, we will arrive at a data structure that
> should be able to, given an IP address, tell us whether it is new (and
> we should increment the unique counter) or old (and we should leave the
> unique counter the same), with some reasonably small false positive
> rate. Basically, we're supposed to use some kind of Bloom filter like
> structure.
> 
> Then can't that structure then be brute-forced, offline, by an attacker?
> IPv4 addresses are 32-bits (~4.3 billion of them), so an attacker could
> just run whatever method we use to check membership over and over, and
> then recover the set of IP's. The same happens if we hash the IP's
> beforehand.
> 
> So, is this attack acceptable? The only mitigation I've seen is the one
> referenced by 'Aaron' in the ticket, which is the system that git uses,
> cryptolog; there, they have a random salt that changes daily. Then, an
> attacker can only learn the IP's for one day. This sounds like a
> reasonable compromise to me, but then the implementation becomes rather
> simple; just hash the IP's with a random salt that changes daily before
> putting them in the set.
> 
> IPv6 also solves this (128 bits), but there again, the solution is just
> to hash the IP's before storing them - the Bloom filter/'Probabilistic
> Counting Algorithm' is unnecessary.
> 
> I think I must be missing something about how the 'Probabilistic
> Counting Algorithm' works - somehow, it needs to keep track of the # of
> unique IP's without knowing (with a high probability) whether any 1
> individual IP has been seen. 
> 
> Any help/pointing out of errors in my reasoning would be useful. 
> 
> Thanks,
> Samir Menon
> menon.sa...@gmail.com 
> sam...@stanford.edu 
> 
> [1] https://trac.torproject.org/projects/tor/ticket/7532
> [2] 
> http://www.mathcs.emory.edu/~cheung/papers/StreamDB/Probab/1985-Flajolet-Probabilistic-counting.pdf
> 
> 
> ___
> tor-dev mailing list
> tor-dev@lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
> 

-- 
Jaskaran Veer Singh (jvsg)
jvsg1303 at gmail dot com
PGP 2814 3FB7 A32D 429B 092E 27F0 8AA3 C532 9E1A 6AD8

___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Anonymous Local Count Statistics Using PCSA - GSoC

2017-04-01 Thread Jaskaran Singh
Hi Aaron,

These statistics not just tell about the user's country but also keep a
track of unique IP addresses connecting from each country. This is
needed so as to present more realistic stats. If we increment counter on
any IP address instead of unique IP address then the statistics would
also reflect  user(s) connecting again and again. If we don't count
Unique IPs, we would have stats about per country usage rather than per
country users. We could do much better and implement a way(as described
by the OP of thread) that counts unique IPs at the same time preserves
privacy.

And for your second point about hiding the actual counter from
adversary, I agree that this can potentially de-anonymize a client.
An adversary (let's say the government of some small, less populous
country) could try to fingerprint the traffic of it's target(s) and
later correlate it with the data we publish on the metrics site. This
attack could work very well for countries where the Tor users can be
counted on fingers. So, I believe hiding the counter data should also be
implemented along with hiding the IP addresses.

Regards,
--
Jaskaran Veer Singh (jvsg)
jvsg1303 at gmail dot com
PGP 2814 3FB7 A32D 429B 092E 27F0 8AA3 C532 9E1A 6AD8

___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Anonymous Local Count Statistics Using PCSA - GSoC

2017-04-02 Thread Jaskaran Singh
Oops, clicked the SEND button accidentally.
Sorry! You can ignore it.


> Ah! That reminds me that OP(of this thread) should also aim to fix #8786
> along with that could enable such a counting technique for Pluggable
> transports.
> 
> Now coming to the main point,
> 
>> In addition, each user doesn’t necessarily correspond to
>> a different IP because of NAT, and so counting connections may actually be
>> more accurate.
> I agree,

-- 
Jaskaran Veer Singh (jvsg)
jvsg1303 at gmail dot com
PGP 2814 3FB7 A32D 429B 092E 27F0 8AA3 C532 9E1A 6AD8

___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Open topics of prop247: Defending Against Guard Discovery Attacks using Vanguards

2017-06-11 Thread Jaskaran Singh
Hi George,

On Wednesday 17 May 2017 05:21 PM, George Kadianakis wrote:
> 1.1. Visuals
>
>  Here is how a hidden service rendezvous circuit currently looks like:
>
> -> middle_1 -> middle_A
> -> middle_2 -> middle_B
> -> middle_3 -> middle_C
> -> middle_4 -> middle_D
>   HS -> guard   -> middle_5 -> middle_E -> Rendezvous Point
> -> middle_6 -> middle_F
> -> middle_7 -> middle_G
> -> middle_8 -> middle_H
> ->   ...->  ...
> -> middle_n -> middle_n
>
>  this proposal pins the two middles nodes to a much more restricted
>  set, as follows:
>
>  -> guard_3A_A
> -> guard_2_A -> guard_3A_B
>  -> guard_3A_C -> Rendezvous Point
>   HS -> guard_1
>  -> guard_3B_D
> -> guard_2_B -> guard_3B_E
>  -> guard_3B_F -> Rendezvous Point
>
>
>  Note that the third level guards are partitioned into buckets such that
>  they are only used with one specific second-level guard. In this way,
>  we ensure that even if an adversary is able to execute a Sybil attack
>  against the third layer, they only get to learn one of the second layer
>  Guards, and not all of them. This prevents the adversary from gaining
>  the ability to take their pick of the weakest of the second-level
>  guards for further attack.

I think this scheme works like if there are x number of third level
guards, then they are divided into buckets of x/k number of guards each,
where k is the number of second level guards. Now, I feel that dividing
guards into buckets is a little pointless. Suppose we have 1000 possible
third level guards, and 500 possible second level guards. We have to
select 4 third level guards for each bucket, and 2 second level guard
for each hidden service. Now even in this case the adversary has to do
as much effort as before. What if the guards are divided into buckets,
at least now the possible pool of third level guards in which the sybil
attack is to be conducted get reduced. So the 1000 third level guards
get divided into pool of 500 each. Hence easier to accomplish that
attack, but for that the adversary has to allocate 2x resources if she
wants to take advantage of that. So net result is zero.

I haven't had my coffee, so please correct me if I'm wrong somewhere :)

Regards,
-- 
Jaskaran Veer Singh (jvsg)
jvsg1303 at gmail dot com
PGP 2814 3FB7 A32D 429B 092E 27F0 8AA3 C532 9E1A 6AD8

___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Start contributing to Tor

2017-12-03 Thread Jaskaran Singh
Hi Aruna,

You could have a look at these spec files if you haven't already
https://gitweb.torproject.org/torspec.git/tree/

Regards,
Jaskaran

On Mon, Dec 4, 2017 at 12:23 PM, Aruna Maurya 
wrote:

> I am new to the community and would like to contribute and help along. I
> did a complete read up on how the Tor browser works, but I would like to
> delve in more and get acquainted with the code base, so that I understand
> and learn a lot in the process.
>
> I already cloned and built the Tor(core) and TorBrowser from source for
> easy understanding and primarily as it would help me to reproduce bugs as I
> work on them.
>
> Any further guidance is appreciated.
>
> Thankyou for spending the time to read this through.
>
> --
> Regards,
> Aruna Maurya,
> CSE,B.tech,
> Blog  | Medium
> 
>
>
> ___
> tor-dev mailing list
> tor-dev@lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>
>
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Start contributing to Tor

2017-12-04 Thread Jaskaran Singh
Yes somewhat. They can provide you an overall idea, but the best
documentation is the code itself.

On Mon, Dec 4, 2017 at 1:05 PM, Aruna Maurya 
wrote:

> Hey!
>
> Thanks for the spec files. But why and what do they exactly do? Are they
> somewhat like a documentation of everything?
>
>
> On Mon, Dec 4, 2017 at 12:54 PM, Jaskaran Singh 
> wrote:
>
>> Hi Aruna,
>>
>> You could have a look at these spec files if you haven't already
>> https://gitweb.torproject.org/torspec.git/tree/
>>
>> Regards,
>> Jaskaran
>>
>> On Mon, Dec 4, 2017 at 12:23 PM, Aruna Maurya 
>> wrote:
>>
>>> I am new to the community and would like to contribute and help along. I
>>> did a complete read up on how the Tor browser works, but I would like to
>>> delve in more and get acquainted with the code base, so that I understand
>>> and learn a lot in the process.
>>>
>>> I already cloned and built the Tor(core) and TorBrowser from source for
>>> easy understanding and primarily as it would help me to reproduce bugs as I
>>> work on them.
>>>
>>> Any further guidance is appreciated.
>>>
>>> Thankyou for spending the time to read this through.
>>>
>>> --
>>> Regards,
>>> Aruna Maurya,
>>> CSE,B.tech,
>>> Blog <https://themindreserves.wordpress.com/> | Medium
>>> <https://medium.com/@arunamaurya>
>>>
>>>
>>> ___
>>> tor-dev mailing list
>>> tor-dev@lists.torproject.org
>>> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>>>
>>>
>>
>
>
> --
> Regards,
> Aruna Maurya,
> CSE,B.tech,
> Blog <https://themindreserves.wordpress.com/> | Medium
> <https://medium.com/@arunamaurya>
>
>
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Starting with contributing to Anonymous Local Count Statistics.

2018-02-02 Thread Jaskaran Singh
Hi,

I thought the project idea had already been depreciated in favor of
counting unique users by directory fetches. No?

Regards,
Jaskaran

On Fri, Feb 2, 2018 at 3:48 PM, George Kadianakis 
wrote:

> Aruna Maurya  writes:
>
> > [ text/plain ]
> > Hey!
> >
> > What is the current status of the project, how much work has been done
> and
> > where can I pick up from?
> >
>
> Hi!
>
> The project is currently not being worked on.
>
> Mainly design work has been done so far; no code has been written.
> See:   https://lists.torproject.org/pipermail/tor-dev/2017-March/
> 012001.html
>https://lists.torproject.org/pipermail/tor-dev/2017-March/
> 012073.html
>
> I suggest you pick it up by fleshing out the design work and seeing if
> it works for you, and then checking out the code to see where you need
> to inject the code. Perhaps you can also get in touch with Jaskaran
> Singh (jvsg1...@gmail.com) who did all the previous design work to see
> if he is interested in collaborating!
>
> Cheers!
>
>
>
> > On Fri, Feb 2, 2018 at 3:04 PM, Aruna Maurya 
> > wrote:
> >
> >>
> >> -- Forwarded message --
> >> From: George Kadianakis 
> >> Date: Wed, Jan 31, 2018 at 6:32 PM
> >> Subject: Re: [tor-dev] Starting with contributing to Anonymous Local
> Count
> >> Statistics.
> >> To: Aruna Maurya ,
> tor-dev@lists.torproject.org
> >>
> >>
> >> Aruna Maurya  writes:
> >>
> >> > [ text/plain ]
> >> > Hey!
> >> >
> >> > I was going through the Tor Volunteer page and came across the
> Anonymous
> >> > local count statistics project. As a student it would be a great
> starting
> >> > point and an even bigger opportunity to get a chance to collaborate
> and
> >> > learn in the process.
> >> >
> >> > I would like to contribute to it, and would love to start as soon as
> >> > possible. It would be great if someone could guide me through.
> >> >
> >>
> >> Hello Aruna,
> >>
> >> thanks for reaching out.
> >>
> >> I also find this project interesting. I'd like to help you but my time
> >> is quite limited lately.
> >>
> >> What would you like guidance with?
> >>
> >> With regards to design, I suggest you take a look at the last comments
> >> of this trac ticket:  https://trac.torproject.org/pr
> >> ojects/tor/ticket/7532#comment:22
> >> Particularly it seems like the PCSA algorithm might be a reasonable way
> >> forward.
> >>
> >> With regards to coding, I suggest you familiarize yourself with the Tor
> >> codebase. Some specific places to look at would be the way that Tor
> >> currently counts users. For example, see geoip_note_client_seen() and
> >> its callers, for when bridges register new clients to their stats
> >> subsystem. Also check geoip_format_bridge_stats() for when bridges
> >> finally report those stats.
> >>
> >> Let us know if you have any specific questions!
> >>
> >> Cheers!
> >>
> >>
> >>
> >> --
> >> Regards,
> >> Aruna Maurya,
> >> CSE,B.tech,
> >> Blog <https://themindreserves.wordpress.com/> | Medium
> >> <https://medium.com/@arunamaurya>
> >>
> >>
> >
> >
> > --
> > Regards,
> > Aruna Maurya,
> > CSE,B.tech,
> > Blog <https://themindreserves.wordpress.com/> | Medium
> > <https://medium.com/@arunamaurya>
> > [ text/plain ]
> > ___
> > tor-dev mailing list
> > tor-dev@lists.torproject.org
> > https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
> ___
> tor-dev mailing list
> tor-dev@lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Starting with contributing to Anonymous Local Count Statistics.

2018-02-02 Thread Jaskaran Singh
Hi,

Oops. I meant "counting unique users just like we do with directory
fetches". The argument given was that most users are already behind NAT and
hence counting unique IP addrs would not be accurate anyway. It was
suggested that we count per connection country statistics (that is, total
number of connections coming from a country) and divide that by average
number of connections a user makes, to arrive at estimated number of unique
users. I cannot find where I got to know this. Maybe on IRC, but I don't
have logs of an year ago.

Also, the Metrics team has(?) to come up with a proposal on this IIRC.
Until then it would not be considered a valid project?

Karsten would have something to say on this.

cc: karsten

On Fri, Feb 2, 2018 at 4:33 PM, George Kadianakis 
wrote:

> Jaskaran Singh  writes:
>
> > [ text/plain ]
> > Hi,
> >
> > I thought the project idea had already been depreciated in favor of
> > counting unique users by directory fetches. No?
> >
>
> Yes, we do count unique users by directory fetches for the "active Tor
> users" metric: https://metrics.torproject.org/userstats-relay-country.html
>
> But we also use in-memory data structures tracking IP addresses to count
> unique users per-country: https://metrics.torproject.
> org/userstats-bridge-combined.html?start=2017-11-04&end=
> 2018-02-02&country=dz
>
> I was not aware that we are planning to deprecate the latter in favor of
> counting directory fetches. Did you get that from somewhere? Perhaps it
> could make sense, not sure.
>
> Cheers!
>
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


[tor-dev] Proposal: Check Maxmind GeoIP DB before distributing

2018-06-30 Thread Jaskaran Singh
Hi List,

Please have a look at this proposal.

Filename: Check-Maxmind-GeoIP-DB-before-distributing.txt
Title: Check Maxmind GeoIP-DB before distributing
Ticket(s): #26240
Author: Jaskaran Singh
Created: June 2018
Status: Open

0. Motivation and Overview
We're using Maxmind's (company registered in the US) GeoIP Database,
which is not just antithetical to the philosophy that one should not
totally rely on a service/software for all needs, but has some serious
security repercussions too.

Trusting Maxmind's GeoIP Database is dangerous, as it may lead to some
possible attacks on the Network. We propose that the Database be checked
for integrity before distributing to the users. The whole process of
checking for integrity can be assigned to the Directory Authorities (or
any trusted systems) who would be responsible for completing it using a
script.

We should also give a choice to the user whether she wants to use
Maxmind's DB or any other DB of her choice, or even to not use any
Geo-IP DB at all.

1. Threat Model
We assume an adversary that is capable of introducing false information
in the Maxmind GeoIP database, either by it's influence over the company
or otherwise. The adversary also has enough resources to perform Sybil
attack on the network.

2. Attacks on the Network

2.1 Sybil attack under the Radar
The Tor Network is constantly monitored for any suspicious spike in
nodes, as it may be an indication of an oncoming/undergoing sybil
attack. A powerful adversary can coerce Maxmind to map some specific IP
address blocks to different countries. This may lead to people/scripts
monitoring the network to not feel suspicious about this event, and
would result in the adversary staying under the radar.

2.2 False Location indication for a shady node
A large percentage of people don't want the exit of their circuits to be
located in certain countries where the communication is under
surveillance. The powerful adversary knows this as well. Users generally
add a line in their config that allows them to not form a circuit
through nodes located in those locations. To overcome this, the
adversary can coerce Maxmind to alter it's database to map some
particular IP's to locations which the user thinks are havens of free
speech.

3. Design of the Solution
We should check Maxmind database against it's own previous versions.
Additionally we should also simply stop using GeoIP database
intrinsically for every purpose but still allow users to plug in their
own databases through the interface we implement. Perhaps the latter can
be introduced as ./configure option for when the user is highly
distrustful of Maxmind and wants to use a service she trusts, or doesn't
wants to use at all. The two solutions are explained below.

3.1 Checking for integrity

Step 1: The Dir Authorities (or any trusted computers) fetch the latest
maxmind geoip-db along with its previous versions.

Step 2: Tor Nodes' location are checked against the previous versions
for any changes.

Step 3: All the Dir Authorities perform the above two steps
independently of each other. A count of the number of changes in node
locations is maintained. If the changes are in significant amount, they
are viewed with suspicion, since this can be the preparation of a sybil
attack by the adversary. In such a case, the new changes into the
database can be discarded. Though, even change in a single node's
location is concerning, but it is not easy attribute that change to
malice. Sometimes there are genuine reasons for a location to change.

Step 4. This database is then distributed to the users.

3.2 Doing away with GeoIP location altogether
GeoIP databases are occasionally un-realiable and can be done away with
safely. We can provide a ./configure option to the users that enables
them to plug in their own trusted service. If the user doesn't have
access to a database of her own choice, she can simply choose Maxmind,
or not use any database at all. It would remove our dependence from just
one database, and diversify our usage.

4. Licensing issues
Maxmind has a pretty liberal license when it comes to their database, as
summarized below

Maxmind - CC BY-SA 4.0
* Copy and redistribute the material in any medium or format
* remix, transform, and build upon the material
  for any purpose, even commercially

5. Dealing with false positives
Maxmind calculates geolocation of an IP addr using WHOIS records,
Reverse DNS etc. It claims to have precision rate of 99.5% on country
level. The other 0.5% is more likely to be those IP addresses for which
neither WHOIS record nor Reverse DNS are setup.

A very large percentage of Tor Nodes are run from datacenters, which
usually have all their records set up. It's highly unlikely for an IP
address belonging to a datacenter to be mapped to a wrong location.

Hence, false positives would be very few, and can be safely ignored
after