[issue41045] f-string's "debug" feature is undocumented

2020-07-13 Thread Rishi


Rishi  added the comment:

Hello all,

Could I help by adding this to the documentation ?

--
nosy: +rishi93

___
Python tracker 
<https://bugs.python.org/issue41045>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39017] Infinite loop in the tarfile module

2020-07-12 Thread Rishi


Rishi  added the comment:

Thank you. I have signed the CLA agreement. I have pushed my code changes and 
also written a testcase for this issue

--

___
Python tracker 
<https://bugs.python.org/issue39017>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39017] Infinite loop in the tarfile module

2020-07-12 Thread Rishi


Change by Rishi :


--
keywords: +patch
pull_requests: +20602
stage: test needed -> patch review
pull_request: https://github.com/python/cpython/pull/21454

___
Python tracker 
<https://bugs.python.org/issue39017>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39017] Infinite loop in the tarfile module

2020-07-10 Thread Rishi


Rishi  added the comment:

Hi ! I would like to start contributing to CPython. Can I start working on this 
issue ?

--

___
Python tracker 
<https://bugs.python.org/issue39017>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39017] Infinite loop in the tarfile module

2020-07-10 Thread Rishi


Change by Rishi :


--
nosy: +rishi93

___
Python tracker 
<https://bugs.python.org/issue39017>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1610654] cgi.py multipart/form-data

2014-12-09 Thread Rishi

Rishi added the comment:

One of my comments shot the wrapped line limit. Also changed the test in 
question to check the lengths of the expected and actual buffer to checking the 
contents of the respective buffers.

--
Added file: http://bugs.python.org/file37400/issue1610654_5.patch

___
Python tracker 
<http://bugs.python.org/issue1610654>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1610654] cgi.py multipart/form-data

2014-12-09 Thread Rishi

Rishi added the comment:

There is indeed a test failure that occurs without the patch. This is a new 
test I had added. 
The reason is that in the existing implementation, when a boundary does not 
exist, the implementation does not include the trailing CRLF, LF or for that 
matter CR as part of the payload. I think that is not correct. 

However, to keep this patch compatible with behavior of existing implementation 
I have updated the patch to strip a single CRLF, LR or CR from the payload if a 
boundary is not found.

--
Added file: http://bugs.python.org/file37399/issue1610654_4.patch

___
Python tracker 
<http://bugs.python.org/issue1610654>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1610654] cgi.py multipart/form-data

2014-11-08 Thread Rishi

Rishi added the comment:

Hi,
I have created a new patch with a small design change. The change is that in 
situations where I don't find the boundary instead of keeping the last x bytes 
in the buffer I simply drain the whole data and call a readline(). 
This seems like the right thing to do also. I managed to get rid of the two 
obfuscated helper functions keep_x_buffer and remove_x_buffer that I had and 
the code should look familiar (I hope) to a module owner.
This also helped me get rid of quite a few class member variables that I could 
move to locals of my main function(multi_read).
I still need to maintain an overlap, but only for the trailing CRLF boundary. 
Ran all the new and old tests and tested on apache with the ubuntu iso server 
image. Without the patch ubuntu iso server image took 93seconds .. with the 
patch it took 25seconds.

--
Added file: http://bugs.python.org/file37149/issue1610654_3.patch

___
Python tracker 
<http://bugs.python.org/issue1610654>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1610654] cgi.py multipart/form-data

2014-11-04 Thread Rishi

Rishi added the comment:

Patch updated from review comments. Also added a few corner test cases.

--
Added file: http://bugs.python.org/file37128/issue1610654_2.patch

___
Python tracker 
<http://bugs.python.org/issue1610654>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1610654] cgi.py multipart/form-data

2014-10-14 Thread Rishi

Rishi added the comment:

I have recreated the patch(issue1610654_1.patch) and it performs more or less 
like the earlier patch

Serhiy,
I agree we cannot use handmade buffering here, without seeking ahead.
I believe, we can make optimizations for streams which are buffered and 
non-seekable.
Cgi modules default value for file object is the BufferedReader of sys.stdin, 
so the solution is fairly generic.

I have removed handmade buffering. Neither do I create a Buffered* object.
We rely on user to create the buffered object. The sys.stdin that cgi module 
has a decent buffer underneath that
works well on apache.

The patch attached does not seek, nor does it read ahead. It only looks ahead.
As Antoine suggests, it peeks the buffer and determines through a fast lookup 
if the buffer has a bounary or not.
It moves forward only if it is convinced that the current buffer is completely 
within the next boundary.


The issue is that the current implementation deals with lines and not chunks.
Even when a savy user wraps sys.stdin around a large BufferredReader there is 
little to no peformance improvement in 
the current implementation for large files in my observation. It does not solve 
the bug mentioned either.
The difference in extreme cases like Chui's is 53s against 0.7s and even 
otherwise for larger files the patch
is 3 times faster than the current implementation.
I have tested this on Apache2 server where the sys.stdin is buffered.

--
Added file: http://bugs.python.org/file36927/issue1610654_1.patch

___
Python tracker 
<http://bugs.python.org/issue1610654>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1610654] cgi.py multipart/form-data

2014-10-13 Thread Rishi

Rishi added the comment:

Antoine, I will upload a patch that relies on BufferedReader. As you mentioned, 
it will get rid of supporting the buffer and reduce a lot of code.
The only issue is that it helps me to know if the current buffer is at EOF (the 
documentation of peek does not mention  guaranteeing Eof if buffer returned is 
less than what we expect), because patterns at EOF are different, but I think I 
can work around that.

--

___
Python tracker 
<http://bugs.python.org/issue1610654>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1610654] cgi.py multipart/form-data

2014-10-13 Thread Rishi

Rishi added the comment:

My observation is that a file with more than normal (exact numbers below) 
line-feed characters takes way too long. 

I tried porting the above patch to my default branch, but it has some boundary 
and CRLF/LF issues, but more importantly it relies on seeking the file-object, 
which in the real world is stdin for web browsers and hence is illegal in that 
environment.

I have attached a patch which is based on the same principle as Chui mentioned, 
ie reading a large buffer, but this patch does not deal with line feeds at all. 
It instead searches the entire boundary in a large buffer.

The cgi module file-object only relies on readline and read functionality - so 
I created a wrapper class around read and readline to introduce buffering 
(attached as patch).
 
When multipart boundaries are being searched, the patch fills a huge buffer, 
like in the original solution. It searches for the entire boundary and returns 
a large chunk of the payload in one call, rather than line by line.

To search, there are corner cases ( when boundary is overlapping between 
buffers) and CRLF issues. A boundary in itself could have repeating characters 
causing more search complexity. 
To overcome this, the patch uses simple regular exressions without any 
expanding or wild characters. If a boundary is not found, it returns the chunk 
- length of the buffer - CRLF prefixes, to ensure that no boundary is 
overlapping between two consecutive buffers. The expressions take care of CRLF 
issues. 

When read and readline are called, the patch looks for data in the buffer and 
returns appropriately.

There is a overall performance improvement in cases of large files, and very 
significant in case of files with very high number of LF characters.

To begin with I created a 20MB file with 20% of the file filled with LineFeeds. 

File - 20MB.bin
size - 20MB
description - file filled with 20% (~4MB) '\n'
Parse time with default cgi module - 53 seconds
Parse time with patch - 0.4s

This time increases linearly with the number of LFs for the default module.ie 
keeping the size same at 20MB and doubling the number of LFs to 40% would 
double the parse time. 

I tried with a normal large binary file that I found on my machine.
size: 88mb
description - binary executable on my machine,
  binary image has 140k lfs.
Parse time with default cgi module - 2.7s
Parse time with patch- 0.7s

I have tested with a few other files and noticed time is cut by atleast half 
for large files.


Note: 
These numbers are consitent over multiple observations.
I tested this using the script attached, and also on my localhost server.
The time taken is obtained by running the following code.

t1=time.time()
cProfile.run("fs = cgi.FieldStorage()")
print(str(len(fs['datafile'].value)))
t2 = time.time()
print(str(t2 - t1))

I have tried to keep the patch compatible with the current module. However I 
have introduced a ValueError excepiton in the module when boundary is very 
large ie. 1024 bytes. The RFC specifies the maximum length to be 70 bytes.

--
keywords: +patch
nosy: +rishi.maker.forum
Added file: http://bugs.python.org/file36895/issue1610654.patch

___
Python tracker 
<http://bugs.python.org/issue1610654>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22351] NNTP constructor exception leaves socket for garbage collector

2014-10-10 Thread Rishi

Rishi added the comment:

patch updated based on comments.

--
Added file: http://bugs.python.org/file36873/issue22351_2.patch

___
Python tracker 
<http://bugs.python.org/issue22351>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22351] NNTP constructor exception leaves socket for garbage collector

2014-10-05 Thread Rishi

Rishi added the comment:

patch updated to use just plain exception

--
Added file: http://bugs.python.org/file36819/issue22351_1.patch

___
Python tracker 
<http://bugs.python.org/issue22351>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22351] NNTP constructor exception leaves socket for garbage collector

2014-10-05 Thread Rishi

Rishi added the comment:

Here is my attempt to fix this issue. This is my first patch ever :).
IMO checking socket leaks in the constructor requires an actual server, so I 
create an actual localhost dummy server and test some error conditions that are 
encountered by the constructor.

--
keywords: +patch
nosy: +rishi.maker.forum
Added file: http://bugs.python.org/file36811/issue22351.patch

___
Python tracker 
<http://bugs.python.org/issue22351>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com