aglinxinyuan opened a new issue, #5661:
URL: https://github.com/apache/texera/issues/5661

   ## Background
   
   `URLFetchUtil` in `common/workflow-operator` 
(`operator/source/fetcher/URLFetchUtil.scala`) currently lacks a dedicated 
unit-spec. It is the small retry-loop utility used by scan-source operators 
when reading a URL — it wraps `URLConnection`, sets a random user-agent (via 
`RandomUserAgent`), and retries up to 5 times on any throw.
   
   ## Behavior to pin
   
   | Surface | Contract |
   | --- | --- |
   | Successful single attempt | returns `Some(InputStream)` on the first try 
and reads the body |
   | Default `retries = 5` | up to 5 total attempts when every attempt fails |
   | Explicit `retries = 1` | exactly one attempt |
   | Explicit `retries = 0` | returns `None` immediately (loop body never 
enters) |
   | All attempts fail | returns `None` (does NOT throw — the inner `try/catch` 
swallows exceptions) |
   | Stops at first success | when a later attempt succeeds, no further 
attempts are made |
   | Sets `User-Agent` request property | the value comes from 
`RandomUserAgent.getRandomUserAgent` (use a `URLStreamHandler` to capture 
headers and pin that the header is set, without pinning its value) |
   
   ## Scope
   
   - New spec file: `URLFetchUtilSpec.scala` (matches the 
`<srcClassName>Spec.scala` convention).
   - The test harness should NOT hit the network. Use a custom 
`java.net.URLStreamHandler` (or `URL.setURLStreamHandlerFactory` 
once-per-suite) that returns a `URLConnection` with controllable behavior: 
success returns a real `InputStream` (`ByteArrayInputStream`), failure throws 
`IOException`. Count attempts via a shared counter.
   - No production-code changes.
   
   ## Hint
   
   `URL`-stream-handler injection pattern (Scala):
   
   ```scala
   import java.io.{ByteArrayInputStream, IOException, InputStream}
   import java.net.{URL, URLConnection, URLStreamHandler}
   
   class StubHandler(behavior: (Int) => Either[IOException, Array[Byte]]) 
extends URLStreamHandler {
     val attempts = new java.util.concurrent.atomic.AtomicInteger(0)
     protected override def openConnection(u: URL): URLConnection = new 
URLConnection(u) {
       override def connect(): Unit = ()
       override def getInputStream: InputStream = {
         val i = attempts.incrementAndGet()
         behavior(i) match {
           case Right(bytes) => new ByteArrayInputStream(bytes)
           case Left(e)      => throw e
         }
       }
     }
   }
   ```
   
   This pattern lets you compose attempt-by-attempt success / failure scripts 
without any DNS or socket activity.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to